<?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom"> <id>https://www.yunseo.kim/</id><title>Yunseo Kim's Study Notes</title><subtitle>Yunseo Kim's personal blog mainly covering mathematics, physics, and engineering.</subtitle> <updated>2026-02-17T02:32:26+09:00</updated> <author> <name>Yunseo Kim</name> <uri>https://www.yunseo.kim/</uri> </author><link rel="self" type="application/atom+xml" href="https://www.yunseo.kim/feed.xml"/><link rel="alternate" type="text/html" hreflang="en" href="https://www.yunseo.kim/" /><link rel="alternate" type="text/html" hreflang="ko" href="https://www.yunseo.kim/ko/" /><link rel="alternate" type="text/html" hreflang="ja" href="https://www.yunseo.kim/ja/" /><link rel="alternate" type="text/html" hreflang="zh-TW" href="https://www.yunseo.kim/zh-TW/" /><link rel="alternate" type="text/html" hreflang="es" href="https://www.yunseo.kim/es/" /><link rel="alternate" type="text/html" hreflang="pt-BR" href="https://www.yunseo.kim/pt-BR/" /><link rel="alternate" type="text/html" hreflang="fr" href="https://www.yunseo.kim/fr/" /><link rel="alternate" type="text/html" hreflang="de" href="https://www.yunseo.kim/de/" /><link rel="alternate" type="text/html" hreflang="pl" href="https://www.yunseo.kim/pl/" /><link rel="alternate" type="text/html" hreflang="cs" href="https://www.yunseo.kim/cs/" /> <generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator> <rights> © 2026 Yunseo Kim </rights> <icon>/assets/img/favicons/favicon.ico</icon> <logo>/assets/img/favicons/favicon-96x96.png</logo> <entry><title xml:lang="en">How to Prepare IR Materials</title><link href="https://www.yunseo.kim/posts/how-to-prepare-ir-materials/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/how-to-prepare-ir-materials/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/how-to-prepare-ir-materials/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/how-to-prepare-ir-materials/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/how-to-prepare-ir-materials/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/how-to-prepare-ir-materials/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/how-to-prepare-ir-materials/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/how-to-prepare-ir-materials/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/how-to-prepare-ir-materials/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/how-to-prepare-ir-materials/" rel="alternate" type="text/html" hreflang="cs" /><published>2026-01-11T00:00:00+09:00</published> <updated>2026-01-11T00:00:00+09:00</updated> <id>https://www.yunseo.kim/posts/how-to-prepare-ir-materials/</id> <author> <name>Yunseo Kim</name> </author> <category term="Startup" /> <category term="IR" /> <summary xml:lang="en">Learn what IR materials are and what to include in an effective deck to successfully raise investment.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Learn what IR materials are and what to include in an effective deck to successfully raise investment.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="what-are-ir-materials">What are IR materials?</h2><p><strong>IR</strong> is an abbreviation for <strong>Investor Relations</strong>. It is an umbrella term for all materials and activities required to explain and promote a company to investors, build relationships, and raise investment. In practice, “IR materials” usually refers to the materials a company presents to investors for fundraising.</p><h2 id="what-to-include-in-ir-materials">What to include in IR materials</h2><p>Because the purpose of IR materials is fundraising, you need to persuasively present—<em>from an investor’s perspective</em>—why they should invest in your company. Accordingly, you should cover the business end-to-end, including a service summary, market landscape, product/service description, competitive landscape, traction, business model, growth plans, and team.</p><ul><li><strong>Pitch Deck</strong>:<ul><li>The goal is to make a <strong>short, strong, and positive first impression</strong> on a broad set of potential investors<li>Used in early-stage fundraising<li>Typically 10–15 slides; concise and highly visual</ul><li><strong>IR Deck</strong>:<ul><li>Provides <strong>in-depth financial information and long-term strategy</strong><li>Shared with professional investors who have started showing meaningful interest and are close to making a decision<li>Enables investors to make a <strong>deeper evaluation and judgment</strong><li>Typically 20–30 slides; includes more detailed information such as <strong>financial plan, market analysis, team, competitive analysis</strong>, etc.</ul></ul><h3 id="missionvision">Mission/Vision</h3><ul><li>What is the essential value we aim to deliver?</ul><blockquote class="prompt-tip"><p>This is essentially the company’s core identity. It’s best to express the company’s mission and vision clearly and concisely in one sentence each at the very beginning of the IR materials.</p></blockquote><h3 id="service-summary">Service summary</h3><h4 id="problem">Problem</h4><ul><li>What market problem does the service aim to solve?<li>How inconvenient/painful is this for consumers?<li>Why is the problem important?<li>Is there demand for solving it? Who is the target?</ul><h4 id="solution">Solution</h4><ul><li>Specifically, how will you solve the problem described above?<li>Compared to existing approaches, what benefits do consumers and end users gain?</ul><blockquote class="prompt-tip"><p>Investors are often not domain experts. It’s best to explain the service from a consumer’s perspective rather than a developer’s, and handle technical details separately when follow-up questions come in.</p></blockquote><h3 id="market-size">Market size</h3><p>If you define market size directly in monetary terms, the result can vary significantly depending on the calculation method and variables, and it also carries a relatively higher risk of dispute. It can be safer and more effective to present market size using other indicators such as the number of potential users and the number/frequency of transactions.</p><ul><li><strong>TAM (Total Addressable Market, total market)</strong>: The theoretical maximum market size you could reach when offering the product or service globally, assuming an ideal scenario of achieving 100% global market share excluding all competitors<li><strong>SAM (Service Available Market, serviceable market)</strong>: The realistically serviceable market within the scope the company is pursuing, considering geographic, infrastructure, and regulatory constraints<li><strong>SOM (Service Obtainable Market)</strong>: The market size you can realistically capture early on within SAM, considering competition, company capabilities, and marketing strategy</ul><blockquote class="prompt-tip"><p>When estimating market size, people often cite third-party market research for TAM or SAM and provide specific figures and metrics, while describing SOM—what actually matters immediately for a startup—in a way like: “If we achieve X% share in this market, we can reach $Y in revenue.” To be honest, when I was first preparing to start a company, my initial internal IR draft did it this way too.</p><p>The problem with this approach is that, from an investor’s perspective, it’s hard to trust a plan that claims you’ll capture some percentage of the market. You don’t automatically gain market share just because you launch, and vaguely claiming you’ll achieve X% share across all participants in that market is not very persuasive.</p><p>While showing that your TAM and SAM are sufficiently large, it’s important to present a clear logic for how you define your <strong>Immediate Market</strong> (early customer segment) and how you will expand SOM over time by sequentially targeting additional customer segments.</p></blockquote><blockquote class="prompt-tip"><p><strong>Business timing</strong></p><ul><li>Timing matters a lot in business<li>You must be able to explain to investors why this business can succeed <em>now</em> and why they should invest <em>now</em><li>You should present reasons why now is the right time to execute, such as technological feasibility, changes in people’s behavior patterns, social trends, and environmental changes</ul></blockquote><h3 id="productservice-description">Product/Service description</h3><ul><li>What are the key features and functions of the product/service?<li>What is the concrete mechanism/how it works, and what are examples?</ul><h3 id="business-model">Business model</h3><ul><li>How will you make money?<li>Who pays? (Because the end user and the paying customer do not always coincide, you must clearly identify who actually generates revenue.)<li>What will you charge for, and how will pricing work?</ul><h3 id="competitive-landscape">Competitive landscape</h3><ul><li>Who are the major competitors?<li>From the <strong>customer’s perspective</strong>, in what ways is our service/product better and what advantages do we have compared with others?<li>Which services do we define as competitors, and which customers will be our primary target?</ul><blockquote class="prompt-tip"><p>If you analyze competitors properly, you can effectively demonstrate to investors that you understand the market landscape.</p></blockquote><h3 id="traction-and-go-to-market-strategy">Traction and go-to-market strategy</h3><ul><li>What is the most important north-star metric for the success of the business?<ul><li>e.g., number of orders, monthly active users (MAU), monthly transaction volume, etc.</ul><li>What traction have you achieved around that metric?<li>What are the company’s main marketing methods and channels?<li>What is the method and cost to acquire new customers?<li>*<strong>What is customer lifetime value (LTV)</strong>?</ul><blockquote class="prompt-info"><p>*<strong>Customer Lifetime Value (LTV)</strong>: A quantified measure of how much total profit a single user generates over the entire period they use the service</p></blockquote><blockquote class="prompt-tip"><p>It’s better to exclude ancillary metrics that are not core KPIs.</p></blockquote><blockquote class="prompt-tip"><p><strong>If you’re an extremely early-stage startup with no revenue yet</strong></p><ul><li>Define and present the service’s <strong>break-even point</strong><li>Do not inflate revenue-related metrics; set them realistically from a conservative viewpoint<li>Present a revenue scenario for the first year of monetization, and add a revenue plan for the next several years to build confidence that you can grow steadily<ul><li>1-year short-term projection<li>3-year mid-term projection<li>5-year long-term projection</ul><li>Actively use graphs and tables so the content can be grasped at a glance<li>Include <strong>hypothesis validation slides</strong> to strengthen the rationale by persuasively explaining why you set those KPIs and revenue scenarios<ul><li>You should build solid evidence for the projected revenue scenario through repeated experiments and hypothesis validation</ul></ul></blockquote><h3 id="the-team">The Team</h3><ul><li>Rather than introducing everyone, focus on key team members (including the CEO/founder) who play critical roles<li>For experience and skills, present ~2–3 items using logos, etc., to improve readability<li>If there are investors or advisors who have played (or are playing) key roles, it can be good to include them as well</ul><h3 id="future-growth-plan-milestones">Future growth plan (Milestones)</h3><ul><li>Present goals by time period and phase<li>Typically, goals are set up to the next funding stage (e.g., seed → until Series A; Series A → until Series B)<li>Present the desired investment amount and the use of funds<li>Rather than setting time buckets too long (e.g., half-year or more), present them in shorter increments such as ~2 months</ul><h3 id="financials">Financials</h3><p>For an IR deck, you should include financials.</p><ul><li>A financial plan for the next 3–5 years<li><strong>Unit economics</strong>: revenue and costs per customer unit<li><strong>Burn rate</strong>: the rate at which a startup spends cash on founding costs, R&amp;D, and other expenses<li>Total revenue and costs<li>EBITDA or a cash flow statement, etc.</ul><blockquote class="prompt-warning"><ul><li>Be careful not to present overly unrealistic financial plans<li>Forecast revenue is often overestimated while projected costs are underestimated, so be cautious when estimating expected revenue scale<li>Estimate costs as accurately as possible, considering product/service development costs as well as operating expenses</ul></blockquote><h2 id="what-to-emphasize-by-funding-stage">What to emphasize by funding stage</h2><h3 id="seed">Seed</h3><ul><li>The stage where you build an MVP, test market response, and validate the viability of the business model<li>You should strongly emphasize early hypotheses and business model validation results, MVP experiment results, and the resulting revenue (if any)</ul><h3 id="pre-a">Pre-A</h3><ul><li>The stage where you must prove growth potential and secure additional capital for product development, marketing, hiring, etc.<li>You need to explain what the core KPI is, how well you’re growing through what activities, and the potential for future growth</ul><h3 id="series-a">Series A</h3><ul><li>The stage of scaling in earnest and increasing company valuation<li>Since hypothesis validation should be complete by this point, you must earn investor trust with quantitative results demonstrating business performance</ul><h2 id="a-few-tips">A few tips</h2><ul><li>Put extra effort into the first five slides in particular to leave a positive first impression<li>It can be good to repeat the mission/vision from the first slide again on the last slide<li>Communicate everything in a top-down structure (lead with the conclusion)<li>The object of investment is the <strong>company</strong>, so the company name takes priority over the service name even in IR materials<li>Potential investors reading your IR materials may not be industry insiders, so explain using simple terms as much as possible, and add explanations when you must use jargon<li>Do not mix the market problem and the solution—separate them<li>Use text mainly as keywords; avoid screenshot images and improve readability with well-chosen visuals<li>Provide accurate and specific figures in tables or graphs<li>Be careful not to omit team introduction, desired investment amount, and the use of funds<li>It’s also good to present an exit strategy for returning capital to investors<li>Even if not perfect, briefly present a plan for what shareholder ownership composition/ratio will look like<li>Don’t overload the main deck; if needed, split detailed materials into appendices<li>Put contact information (email, phone number, name) on the last slide<li>Fonts matter a lot as well—use a highly readable font such as <a href="https://cactus.tistory.com/306">Pretendard</a>, and prepare a PDF to prevent rendering issues</ul><h2 id="references">References</h2><h3 id="kind-korea-investors-network-for-disclosure">KIND (Korea Investor’s Network for Disclosure)</h3><p><a href="https://kind.krx.co.kr/corpgeneral/irschedule.do?method=searchIRScheduleMain&amp;gubun=iRMaterials">https://kind.krx.co.kr/corpgeneral/irschedule.do?method=searchIRScheduleMain&amp;gubun=iRMaterials</a></p><ul><li>A corporate disclosure channel operated by the Korea Exchange (KRX)<li>Provides disclosure information for companies listed on KOSPI, KOSDAQ, and KONEX<li>Since you can review listed companies’ IR materials, you can also check how other recently produced IR materials are typically structured</ul>]]> </content> </entry> <entry><title xml:lang="en">Basic Concepts of Cryptography</title><link href="https://www.yunseo.kim/posts/basic-concepts-of-cryptography/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/basic-concepts-of-cryptography/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/basic-concepts-of-cryptography/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/basic-concepts-of-cryptography/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/basic-concepts-of-cryptography/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/basic-concepts-of-cryptography/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/basic-concepts-of-cryptography/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/basic-concepts-of-cryptography/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/basic-concepts-of-cryptography/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/basic-concepts-of-cryptography/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-11-26T00:00:00+09:00</published> <updated>2025-11-26T00:00:00+09:00</updated> <id>https://www.yunseo.kim/posts/basic-concepts-of-cryptography/</id> <author> <name>Yunseo Kim</name> </author> <category term="Dev" /> <category term="Cryptography" /> <summary xml:lang="en">An introductory guide to modern cryptography: what cryptography is, symmetric and asymmetric cryptography, Kerckhoffs&apos;s principle, key exchange, RSA, and digital signatures.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>An introductory guide to modern cryptography: what cryptography is, symmetric and asymmetric cryptography, Kerckhoffs's principle, key exchange, RSA, and digital signatures.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="what-is-cryptography">What Is Cryptography</h2><p><strong>Cryptography</strong> is, at its core, a subfield of science whose goal is to protect <strong>protocols</strong> against adversarial actions.</p><p>Here, a protocol is a list of steps that one or more people must follow to accomplish something. For example, if you want to share the clipboard between devices, the following could be a protocol for clipboard sharing:</p><ol><li>When there is a change in the clipboard on any device, that device copies the clipboard contents and uploads them to a server.<li>The server notifies the other devices that the shared clipboard has changed.<li>The other devices download the shared clipboard contents from the server.</ol><p>However, this is not a good protocol: if you upload and download the clipboard contents in plaintext, someone in the middle of the communication—or even the server operator—can peek at the clipboard. Cryptography’s role is to defend against such adversaries who try to spy on the clipboard contents.</p><h2 id="symmetric-cryptography">Symmetric Cryptography</h2><h3 id="symmetric-encryption">Symmetric Encryption</h3><blockquote><p>Imagine that Alice needs to send a letter to Bob. To convey confidential information to Bob, Alice instructs a messenger to carry the letter and deliver it. However, Alice does not fully trust the messenger and wants the contents of the letter to remain secret from everyone except Bob—including the messenger who physically carries the letter.</p></blockquote><p>The type of cryptographic algorithm invented long ago for exactly this situation is the <strong>symmetric encryption algorithm</strong>.</p><blockquote class="prompt-info"><p><strong>Primitive</strong><br /> In everyday language, the word <em>primitive</em> means “rudimentary” or “something in a primitive state.” Cryptography also uses the term <em>primitive</em> frequently, but there it means the smallest building-block function or algorithm from which a cryptographic system is constructed. You can think of it as a “basic component” or “underlying logic.”</p></blockquote><p>Consider a primitive that provides the following two functions:</p><ul><li><code class="language-plaintext highlighter-rouge">ENCRYPT</code>: takes a <strong>secret key</strong> (usually a large number) and a <strong>message</strong> as input, and outputs a sequence of numbers as the encrypted message<li><code class="language-plaintext highlighter-rouge">DECRYPT</code>: the inverse of <code class="language-plaintext highlighter-rouge">ENCRYPT</code>; it takes the same secret key and the encrypted message as input and outputs the original message</ul><p>To use such a primitive to hide Alice’s message so that neither the messenger nor any third party can read it, Alice and Bob must first meet in advance and agree on some secret key. Afterwards, Alice can use the agreed secret key with the <code class="language-plaintext highlighter-rouge">ENCRYPT</code> function to encrypt her message and send the ciphertext to Bob via the messenger. Bob then uses the same secret key with the <code class="language-plaintext highlighter-rouge">DECRYPT</code> function to recover the original message.</p><p>Encrypting data using a secret key so that, to an outside observer, it is indistinguishable from meaningless noise is the standard way cryptography protects protocols.</p><p>Symmetric encryption belongs to the broader class of algorithms called <strong>symmetric cryptography</strong> or <strong>secret key cryptography</strong>, and in some cases there may be more than one key.</p><h2 id="kerckhoffss-principle">Kerckhoffs’s Principle</h2><p>Today, instead of paper letters, we use far more powerful communication tools—computers and the internet—to communicate almost in real time. But this also means that malicious “messengers” have become more powerful: they might be unsafe public Wi‑Fi at a café, ISPs, various networking equipment and servers that make up the internet and relay messages, government agencies, or even something inside your own device that runs the algorithms. Adversaries can observe many more messages in real time and can tamper with, eavesdrop on, or censor messages on nanosecond timescales without being noticed.</p><p>From a long history of trial and error in cryptography has emerged a cardinal rule for achieving trustworthy security: <u>cryptographic primitives must be subjected to public analysis</u>. The contrasting methodology is known as <strong>security by obscurity</strong>, whose limitations are clear and which has largely fallen out of use today.</p><p>This principle was first formulated in 11883 by the Dutch linguist and cryptographer Auguste Kerckhoffs and is known as <strong>Kerckhoffs’s principle</strong>. The same idea was expressed by Claude Shannon—an American mathematician, computer scientist, cryptographer, and the father of information theory—as “The enemy knows the system,” that is, “When designing a system, you must assume that the adversary will figure out how it works.” This is called <strong>Shannon’s maxim</strong>.</p><p>The security of a cryptosystem should depend only on the secrecy of the key; even if the cryptosystem itself is public, it should remain secure, and in fact it should be made actively public—like AES—so that many <strong>cryptanalysts</strong> can scrutinize and validate it. Anything secret is always at risk of leaking and therefore is a potential point of failure, so the smaller the secret part, the better for the defender. It is extremely difficult to keep an entire cryptosystem—large and complex—secret for a long time, but keeping only the key secret is relatively easy. Moreover, even if a secret is leaked, replacing a compromised key with a new one is far simpler than replacing the entire cryptosystem.</p><h2 id="asymmetric-cryptography">Asymmetric Cryptography</h2><p>Many real‑world protocols operate on symmetric cryptography, but this approach assumes that the two participants can meet at least once beforehand to agree on a key. Thus, the question of how to decide on a key in advance and share it securely arises; this is known as the <strong>key distribution</strong> problem. Key distribution was a long‑standing challenge, and it was only in the late 11970s that it was resolved with the development of a family of algorithms called <strong>asymmetric cryptography</strong> or <strong>public key cryptography</strong>.</p><p>Representative asymmetric cryptographic primitives include <strong>key exchange</strong>, <strong>asymmetric encryption</strong>, and <strong>digital signatures</strong>.</p><h3 id="key-exchange">Key Exchange</h3><p><strong>Key exchange</strong> works roughly as follows:</p><ol><li>Alice and Bob agree to use some common parameter set $G$.<li>Alice and Bob each choose their own <strong>private key</strong> $a, b$.<li>Alice and Bob combine the common parameters $G$ with their private keys $a$ and $b$ to compute <strong>public keys</strong> $A = f(G, a)$ and $B = f(G, b)$, and then share these public keys openly.<li>Alice uses Bob’s public key $B = f(G, b)$ and her private key $a$ to compute $f(B, a) = f(f(G, b), a)$, while Bob uses Alice’s public key $A = f(G, a)$ and his private key $b$ to compute $f(A, b) = f(f(G, a), b)$.<li>If we use a suitable function $f$ that satisfies $f(f(G, a), b) = f(f(G, b), a)$, then Alice and Bob end up sharing the same secret. A third party may know $G$ and the public keys $A = f(G, a)$ and $B = f(G, b)$, but cannot recover $f(A, b)$ from this information alone, so the secret is preserved.</ol><p>Typically, this shared secret is then used as the secret key for <a href="#symmetric-encryption">symmetric encryption</a> to exchange other messages.</p><p>The first published and most classic key‑exchange algorithm is the Diffie–Hellman key‑exchange algorithm, named after its creators Diffie and Hellman.</p><p>However, Diffie–Hellman key exchange also has limitations. Suppose an attacker intercepts the public keys $A = f(G, a)$ and $B = f(G, b)$ during the public‑key exchange phase and replaces them with the attacker’s own public key $M = f(G, m)$ before forwarding them to Alice and Bob. In that case, Alice and the attacker share a fake secret $f(M, a) = f(A, m)$, and Bob and the attacker share another fake secret $f(M, b) = f(B, m)$. The attacker can then impersonate Bob to Alice and Alice to Bob. In such a situation, we say that <u><strong>a man‑in‑the‑middle (MITM)</strong> has successfully attacked the protocol</u>. Because of this, key exchange does not in itself solve the problem of trust; it mainly helps simplify the procedure when there are many participants.</p><h3 id="asymmetric-encryption">Asymmetric Encryption</h3><p>Shortly after the invention of the Diffie–Hellman key‑exchange algorithm, a follow‑up invention appeared: the <strong>RSA algorithm</strong>, named after its inventors Ronald Rivest, Adi Shamir, and Leonard Adleman. RSA provides two primitives—asymmetric (public‑key) encryption and digital signatures—both belonging to asymmetric cryptography.</p><p>In <strong>asymmetric encryption</strong>, the basic goal of encrypting a message to ensure confidentiality is similar to <a href="#symmetric-encryption">symmetric encryption</a>. However, unlike symmetric encryption, which uses the same symmetric key for both encryption and decryption, asymmetric encryption has the following characteristics:</p><ul><li>It uses two keys: a public key and a private key.<li>Anyone can encrypt with the public key, but only the holder of the private key can decrypt.</ul><ol><li>There exists an open box (the public key) into which anyone can put a message and lock it; once locked, only Bob’s key (the private key) can open it.<li>Alice puts the message she wants to send into the box and locks it (encrypts it), then sends it to Bob.<li>After receiving the locked box (the ciphertext), Bob uses his private key to open the box and retrieve the message (decrypt it).</ol><h3 id="digital-signatures">Digital Signatures</h3><p>RSA not only provides asymmetric encryption, but also <strong>digital signatures</strong>. This signature primitive greatly helps build trust between Alice and Bob. When signing a message, the signer uses their private key; when someone else wants to verify the signature’s authenticity, they use the signed message, the signature, and the signer’s public key.</p><h2 id="the-utility-of-cryptography">The Utility of Cryptography</h2><p>Because the goal of cryptography is to protect protocols from adversarial actions, the utility of cryptography depends on what the protocol seeks to achieve. Most cryptographic primitives and protocols provide one or more of the following properties:</p><ul><li><strong>Confidentiality</strong>: hides and protects some information from parties who are not supposed to see it<li><strong>Authentication</strong>: identifies the communicating party (e.g., verifying that a received message really was sent by Alice)</ul><h2 id="the-cryptography-ecosystem">The Cryptography Ecosystem</h2><pre><code class="language-mermaid">flowchart TD
    Alice[Cryptography researcher]-- Invents primitive --&gt;Primitive(Proposes a new primitive)
    Alice-- Invents protocol --&gt;Protocol(Proposes a new protocol)
    Alice-. Hosts competition .-&gt;C(Algorithm competition)

    David[Private industry]-. Funds .-&gt;Alice
    David-. Hosts competition .-&gt;C

    Eve[Government agency]-. Funds .-&gt;Alice
    Eve-. Hosts competition .-&gt;C

    Primitive --&gt; t1{"Is it implementable?"}
    t1-- Yes --&gt;Protocol
    t1-- No --&gt;term1@{ shape: framed-circle, label: "Stop" }

    Protocol-- Enters competition --&gt;C
    Protocol-- Standardization --&gt;Standard(Standard)
    Protocol-- Files patent --&gt;Patent(Patent expires)
    Protocol-- Implementation --&gt;Library(Library)
    
    C-- Wins competition --&gt;Standard
    C-- Falls out of use --&gt;term2@{ shape: framed-circle, label: "Stop" }

    Standard-- Implementation --&gt;Library
    Standard-- Falls out of use --&gt;term3@{ shape: framed-circle, label: "Stop" }

    Patent-- Falls out of use --&gt;term2@{ shape: framed-circle, label: "Stop" }
    Patent-- Standardization --&gt;Standard
    Patent-- Implementation --&gt;Library

    Library-- Standardization --&gt;Standard
    Library-- Broken --&gt;term4@{ shape: framed-circle, label: "Stop" }
</code></pre>]]> </content> </entry> <entry><title xml:lang="en">Linear Transformations, Null Space, and Image</title><link href="https://www.yunseo.kim/posts/linear-transformation-nullspace-and-image/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/linear-transformation-nullspace-and-image/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/linear-transformation-nullspace-and-image/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/linear-transformation-nullspace-and-image/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/linear-transformation-nullspace-and-image/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/linear-transformation-nullspace-and-image/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/linear-transformation-nullspace-and-image/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/linear-transformation-nullspace-and-image/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/linear-transformation-nullspace-and-image/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/linear-transformation-nullspace-and-image/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-09-18T00:00:00+09:00</published> <updated>2025-09-18T00:00:00+09:00</updated> <id>https://www.yunseo.kim/posts/linear-transformation-nullspace-and-image/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Linear Algebra" /> <summary xml:lang="en">Define linear transformations and study their null space (kernel) and image (range). Prove rank–nullity, relate injectivity/surjectivity to rank and nullity, and show how bases determine linear maps.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Define linear transformations and study their null space (kernel) and image (range). Prove rank–nullity, relate injectivity/surjectivity to rank and nullity, and show how bases determine linear maps.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="prerequisites">Prerequisites</h2><ul><li><a href="/posts/vectors-and-linear-combinations/">Vectors and Linear Combinations</a><li><a href="/posts/vector-spaces-subspaces-and-matrices/">Vector Spaces, Subspaces, and Matrices</a><li><a href="posts/linear-dependence-and-independence-basis-and-dimension/">Linear Dependence and Independence, Bases and Dimension</a><li>Injection, surjection</ul><h2 id="linear-transformations">Linear transformations</h2><p>A special class of functions that preserve the structure of vector spaces are called <strong>linear transformations</strong>. They are fundamental across pure and applied mathematics, social and natural sciences, and engineering.</p><blockquote class="prompt-info"><p><strong>Definition</strong><br /> Let $\mathbb{V}$ and $\mathbb{W}$ be $F$-vector spaces. A function $T: \mathbb{V} \to \mathbb{W}$ is called a <strong>linear transformation</strong> from $\mathbb{V}$ to $\mathbb{W}$ if, for all $\mathbf{x}, \mathbf{y} \in \mathbb{V}$ and $c \in F$, the following hold:</p><ol><li>$T(\mathbf{x}+\mathbf{y}) = T(\mathbf{x}) + T(\mathbf{y})$<li>$T(c\mathbf{x}) = cT(\mathbf{x})$</ol></blockquote><p>When $T$ is a linear transformation, we also simply say that $T$ is <strong>linear</strong>. A linear transformation $T: \mathbb{V} \to \mathbb{W}$ satisfies the following four properties.</p><blockquote class="prompt-info"><ol><li>$T$ linear $\quad \Rightarrow \quad T(\mathbf{0}) = \mathbf{0}$<li>$T$ linear $\quad \Leftrightarrow \quad T(c\mathbf{x} + \mathbf{y}) = cT(\mathbf{x}) + T(\mathbf{y}) \; \forall \, \mathbf{x}, \mathbf{y} \in \mathbb{V},\, c \in F$<li>$T$ linear $\quad \Rightarrow \quad T(\mathbf{x} - \mathbf{y}) = T(\mathbf{x}) - T(\mathbf{y}) \; \forall \, \mathbf{x}, \mathbf{y} \in \mathbb{V}$<li>$T$ linear $\quad \Leftrightarrow \quad T\left( \sum_{i=1}^n a_i \mathbf{x}_i \right) = \sum_{i=1}^n a_i T(\mathbf{x}_i)$</ol></blockquote><blockquote class="prompt-tip"><p>When proving that a function is linear, it is often convenient to use Property 2.</p></blockquote><blockquote class="prompt-tip"><p>Linear algebra has wide and varied applications in geometry because many important geometric maps are linear. In particular, the three principal geometric transformations—<strong>rotation</strong>, <strong>reflection</strong>, and <strong>projection</strong>—are linear transformations.</p></blockquote><p>Two linear transformations occur especially often:</p><blockquote class="prompt-info"><p><strong>Identity and zero transformations</strong><br /> For $F$-vector spaces $\mathbb{V}, \mathbb{W}$:</p><ul><li><strong>Identity transformation</strong>: the function $I_\mathbb{V}: \mathbb{V} \to \mathbb{V}$ defined by $I_\mathbb{V}(\mathbf{x}) = \mathbf{x}$ for all $\mathbf{x} \in \mathbb{V}$<li><strong>Zero transformation</strong>: the function $T_0: \mathbb{V} \to \mathbb{W}$ defined by $T_0(\mathbf{x}) = \mathbf{0}$ for all $\mathbf{x} \in \mathbb{V}$</ul></blockquote><p>Many other familiar operations are linear transformations.</p><blockquote class="prompt-tip"><p><strong>Examples of linear transformations</strong></p><ul><li>Rotation<li>Reflection<li>Projection<li><a href="/posts/vector-spaces-subspaces-and-matrices/#transpose-symmetric-and-skew-symmetric-matrices">Transpose</a><li>Differentiation of a differentiable function<li>Integration of a continuous function</ul></blockquote><h2 id="null-space-and-image">Null space and image</h2><h3 id="definitions-of-the-null-space-and-the-image">Definitions of the null space and the image</h3><blockquote class="prompt-info"><p><strong>Definition</strong><br /> For vector spaces $\mathbb{V}, \mathbb{W}$ and a linear transformation $T: \mathbb{V} \to \mathbb{W}$:</p><ul><li><p><strong>Null space</strong> (or <strong>kernel</strong>): the set of vectors $\mathbf{x} \in \mathbb{V}$ such that $T(\mathbf{x}) = \mathbf{0}$, denoted $\mathrm{N}(T)$</p>\[\mathrm{N}(T) = \{ \mathbf{x} \in \mathbb{V}: T(\mathbf{x}) = \mathbf{0} \}\]<li><p><strong>Range</strong> (or <strong>image</strong>): the subset of $\mathbb{W}$ consisting of all values of $T$, denoted $\mathrm{R}(T)$</p>\[\mathrm{R}(T) = \{ T(\mathbf{x}): \mathbf{x} \in \mathbb{V} \}\]</ul></blockquote><blockquote class="prompt-tip"><p><strong>e.g.</strong> For vector spaces $\mathbb{V}, \mathbb{W}$, the identity $I: \mathbb{V} \to \mathbb{V}$ and the zero map $T_0: \mathbb{V} \to \mathbb{W}$ satisfy:</p><ul><li>$\mathrm{N}(I) = \{\mathbf{0}\}$<li>$\mathrm{R}(I) = \mathbb{V}$<li>$\mathrm{N}(T_0) = \mathbb{V}$<li>$\mathrm{R}(T_0) = \{\mathbf{0}\}$</ul></blockquote><p>A key point going forward is that the null space and the image of a linear transformation are <a href="/posts/vector-spaces-subspaces-and-matrices/#subspaces">subspaces</a> of the corresponding vector spaces.</p><blockquote class="prompt-info"><p><strong>Theorem 1</strong><br /> For vector spaces $\mathbb{V}, \mathbb{W}$ and a linear transformation $T: \mathbb{V} \to \mathbb{W}$, the sets $\mathrm{N}(T)$ and $\mathrm{R}(T)$ are subspaces of $\mathbb{V}$ and $\mathbb{W}$, respectively.</p><p><strong>Proof</strong><br /> Denote the zero vectors of $\mathbb{V}$ and $\mathbb{W}$ by $\mathbf{0}_\mathbb{V}$ and $\mathbf{0}_\mathbb{W}$, respectively.</p><p>Since $T(\mathbf{0}_\mathbb{V}) = \mathbf{0}_\mathbb{W}$, we have $\mathbf{0}_\mathbb{V} \in \mathrm{N}(T)$. Moreover, for $\mathbf{x}, \mathbf{y} \in \mathrm{N}(T)$ and $c \in F$,</p>\[\begin{align*} T(\mathbf{x} + \mathbf{y}) &amp;= T(\mathbf{x}) + T(\mathbf{y}) = \mathbf{0}_\mathbb{W} + \mathbf{0}_\mathbb{W} = \mathbf{0}_\mathbb{W}, \\ T(c\mathbf{x}) &amp;= cT(\mathbf{x}) = c\mathbf{0}_\mathbb{W} = \mathbf{0}_\mathbb{W}. \end{align*}\]<p>$\therefore$ <a href="/posts/vector-spaces-subspaces-and-matrices/#subspaces">Since $\mathbf{0}_\mathbb{V} \in \mathrm{N}(T)$ and $\mathrm{N}(T)$ is closed under addition and scalar multiplication, $\mathrm{N}(T)$ is a subspace of $\mathbb{V}$</a>.</p><p>Similarly, $T(\mathbf{0}_\mathbb{V}) = \mathbf{0}_\mathbb{W}$ implies $\mathbf{0}_\mathbb{W} \in \mathrm{R}(T)$. For all $\mathbf{x}, \mathbf{y} \in \mathrm{R}(T)$ and $c \in F$ (there exist $\mathbf{v}, \mathbf{w} \in \mathbb{V}$ with $T(\mathbf{v}) = \mathbf{x}$ and $T(\mathbf{w}) = \mathbf{y}$), we have</p>\[\begin{align*} T(\mathbf{v} + \mathbf{w}) &amp;= T(\mathbf{v}) + T(\mathbf{w}) = \mathbf{x} + \mathbf{y}, \\ T(c\mathbf{v}) &amp;= cT(\mathbf{v}) = c\mathbf{x}. \end{align*}\]<p>$\therefore$ <a href="/posts/vector-spaces-subspaces-and-matrices/#subspaces">Since $\mathbf{0}_\mathbb{W} \in \mathrm{R}(T)$ and $\mathrm{R}(T)$ is closed under addition and scalar multiplication, $\mathrm{R}(T)$ is a subspace of $\mathbb{W}$</a>. $\blacksquare$</p></blockquote><p>Furthermore, given a basis $\beta = \{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_n \}$ of $\mathbb{V}$, we can find a generating set of the image $\mathrm{R}(T)$ as follows.</p><blockquote class="prompt-info"><p><strong>Theorem 2</strong><br /> For vector spaces $\mathbb{V}, \mathbb{W}$, a linear transformation $T: \mathbb{V} \to \mathbb{W}$, and a <a href="/posts/linear-dependence-and-independence-basis-and-dimension/#basis">basis</a> $\beta = \{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_n \}$ of $\mathbb{V}$, we have</p>\[\mathrm{R}(T) = \mathrm{span}(\{T(\mathbf{v}): \mathbf{v} \in \beta \}) = \mathrm{span}(\{T(\mathbf{v}_1), T(\mathbf{v}_2), \dots, T(\mathbf{v}_n) \})\]<p><strong>Proof</strong></p>\[T(\mathbf{v}_i) \in \mathrm{R}(T) \quad \forall \mathbf{v}_i \in \beta.\]<p>Since $\mathrm{R}(T)$ is a subspace, by <strong>Theorem 2</strong> of <a href="/posts/vector-spaces-subspaces-and-matrices/#subspaces">Vector Spaces, Subspaces, and Matrices</a>,</p>\[\mathrm{span}(\{T(\mathbf{v}_1), T(\mathbf{v}_2), \dots, T(\mathbf{v}_n) \}) = \mathrm{span}(\{T(\mathbf{v}_i): \mathbf{v}_i \in \beta \}) \subseteq \mathrm{R}(T).\]<p>Also,</p>\[\forall \mathbf{w} \in \mathrm{R}(T) \ (\exists \mathbf{v} \in \mathbb{V} \ (\mathbf{w} = T(\mathbf{v}))).\]<p>Because $\beta$ is a basis of $\mathbb{V}$,</p>\[\mathbf{v} = \sum_{i=1}^n a_i \mathbf{v}_i \quad \text{(where } a_1, a_2, \dots, a_n \in F \text{)}.\]<p>Since $T$ is linear,</p>\[\mathbf{w} = T(\mathbf{v}) = \sum_{i=1}^n a_i T(\mathbf{v}_i) \in \mathrm{span}(\{T(\mathbf{v}_i): \mathbf{v}_i \in \beta \})\] \[\mathrm{R}(T) \subseteq \mathrm{span}(\{T(\mathbf{v}_i): \mathbf{v}_i \in \beta \}) = \mathrm{span}(\{T(\mathbf{v}_1), T(\mathbf{v}_2), \dots, T(\mathbf{v}_n) \}).\]<p>$\therefore$ Since both contain each other, $\mathrm{R}(T) = \mathrm{span}({T(\mathbf{v}): \mathbf{v} \in \beta })$. $\blacksquare$</p></blockquote><p>This theorem remains valid even when the basis $\beta$ is infinite.</p><h3 id="dimension-theorem">Dimension theorem</h3><p>Because the null space and image are especially important subspaces, we give special names to their <a href="/posts/linear-dependence-and-independence-basis-and-dimension/#dimension">dimensions</a>.</p><blockquote class="prompt-info"><p>For vector spaces $\mathbb{V}, \mathbb{W}$ and a linear transformation $T: \mathbb{V} \to \mathbb{W}$, assume $\mathrm{N}(T)$ and $\mathrm{R}(T)$ are finite-dimensional.</p><ul><li><strong>Nullity</strong>: the dimension of $\mathrm{N}(T)$, denoted $\mathrm{nullity}(T)$<li><strong>Rank</strong>: the dimension of $\mathrm{R}(T)$, denoted $\mathrm{rank}(T)$</ul></blockquote><p>For a linear transformation, the larger the nullity, the smaller the rank, and vice versa.</p><blockquote class="prompt-info"><p><strong>Theorem 3: Dimension theorem</strong><br /> For vector spaces $\mathbb{V}, \mathbb{W}$ and a linear transformation $T: \mathbb{V}\to \mathbb{W}$, if $\mathbb{V}$ is finite-dimensional, then</p>\[\mathrm{nullity}(T) + \mathrm{rank}(T) = \dim(\mathbb{V})\]</blockquote><h4 id="proof">Proof</h4><p>Let $\dim(\mathbb{V}) = n$ and $\mathrm{nullity}(T) = \dim(\mathrm{N}(T)) = k$, and let a basis of $\mathrm{N}(T)$ be $\{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_k \}$.</p><p>By <a href="/posts/linear-dependence-and-independence-basis-and-dimension/#dimension-of-subspaces">“Linear Dependence and Independence, Bases and Dimension” — <strong>Corollary 6-1</strong></a>, we can extend $\{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_k \}$ to a basis $\beta = \{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_n \}$ of $\mathbb{V}$.</p><p>We now show that $S = \{T(\mathbf{v}_{k+1}), T(\mathbf{v}_{k+2}), \dots, T(\mathbf{v}_n) \}$ is a basis of $\mathrm{R}(T)$. First, for $1 \leq i \leq k$, $T(\mathbf{v}_i) = 0$, so by <a href="#definitions-of-the-null-space-and-the-image"><strong>Theorem 2</strong></a>,</p>\[\begin{align*} \mathrm{R}(T) &amp;= \mathrm{span}(\{T(\mathbf{v}_1), T(\mathbf{v}_2), \dots, T(\mathbf{v}_n) \}) \\ &amp;= \mathrm{span}(\{T(\mathbf{v}_{k+1}), T(\mathbf{v}_{k+2}), \dots, T(\mathbf{v}_n) \}) \\ &amp;= \mathrm{span}(S). \end{align*}\]<p>Thus $S$ generates $\mathrm{R}(T)$. By <a href="/posts/linear-dependence-and-independence-basis-and-dimension/#dimension"><strong>Corollary 5-2 of the replacement theorem</strong></a>, it suffices to show that $S$ is linearly independent.</p><p>Suppose $\sum_{i=k+1}^n b_i T(\mathbf{v}_i) = 0$ (with $b_{k+1}, b_{k+2}, \dots, b_n \in F$). Since $T$ is linear,</p>\[\sum_{i=k+1}^n b_i T(\mathbf{v}_i) = 0 \Leftrightarrow T\left(\sum_{i=k+1}^n b_i \mathbf{v}_i \right) = 0 \Leftrightarrow \sum_{i=k+1}^n b_i \mathbf{v}_i \in \mathrm{N}(T).\]<p>Therefore,</p>\[\begin{align*} &amp;\exists c_1, c_2, \dots, c_k \in F, \\ &amp;\sum_{i=k+1}^n b_i \mathbf{v}_i = \sum_{i=1}^k c_i \mathbf{v}_i \\ \Leftrightarrow &amp;\sum_{i=1}^k (-c_i)\mathbf{v}_i + \sum_{i=k+1}^n b_i \mathbf{v}_i = 0. \end{align*}\]<p>Since $\beta$ is a basis of $\mathbb{V}$, the unique solution of $\sum_{i=1}^k (-c_i)\mathbf{v}_i + \sum_{i=k+1}^n b_i \mathbf{v}_i = 0$ is</p>\[c_1 = c_2 = \cdots = c_k = b_{k+1} = b_{k+2} = \cdots = b_n = 0\]<p>and hence</p>\[\sum_{i=k+1}^n b_i T(\mathbf{v}_i) = 0 \quad \Rightarrow \quad b_i = 0.\]<p>Thus $S$ is linearly independent and is a basis of $\mathrm{R}(T)$.</p>\[\therefore \mathrm{rank}(T) = n - k = \dim{\mathbb{V}} - \mathrm{nullity}(T). \blacksquare\]<h3 id="linear-transformations-and-injectionssurjections">Linear transformations and injections/surjections</h3><p>For linear transformations, injectivity and surjectivity are closely tied to rank and nullity.</p><blockquote class="prompt-info"><p><strong>Theorem 4</strong><br /> For vector spaces $\mathbb{V}, \mathbb{W}$ and a linear transformation $T: \mathbb{V} \to \mathbb{W}$,</p>\[T \text{ is injective} \quad \Leftrightarrow \quad \mathrm{N}(T) = \{\mathbf{0}\}.\]</blockquote><blockquote class="prompt-info"><p><strong>Theorem 5</strong><br /> If finite-dimensional vector spaces $\mathbb{V}, \mathbb{W}$ have the same dimension and $T: \mathbb{V} \to \mathbb{W}$ is linear, then the following four statements are equivalent.</p><ol><li>$T$ is injective.<li>$\mathrm{nullity}(T) = 0$<li>$\mathrm{rank}(T) = \dim(\mathbb{V})$<li>$T$ is surjective.</ol></blockquote><p>Using the <a href="#dimension-theorem">dimension theorem</a>, <a href="#linear-transformations">Properties 1 and 3 of linear transformations</a>, and <a href="/posts/linear-dependence-and-independence-basis-and-dimension/#dimension-of-subspaces">“Linear Dependence and Independence, Bases and Dimension” — <strong>Theorem 6</strong></a>, one can prove <strong>Theorem 4</strong> and <strong>Theorem 5</strong>.</p><p>These two theorems are useful when deciding whether a given linear transformation is injective or surjective.</p><blockquote class="prompt-warning"><p>For an infinite-dimensional vector space $\mathbb{V}$ and a linear transformation $T: \mathbb{V} \to \mathbb{V}$, injectivity and surjectivity are not equivalent.</p></blockquote><p>If a linear transformation is injective, the following theorem can be useful in some cases for testing whether a subset of the domain is linearly independent.</p><blockquote class="prompt-info"><p><strong>Theorem 6</strong><br /> For vector spaces $\mathbb{V}, \mathbb{W}$, an injective linear transformation $T: \mathbb{V} \to \mathbb{W}$, and a subset $S \subseteq \mathbb{V}$,</p>\[S \text{ is linearly independent} \quad \Leftrightarrow \quad \{T(\mathbf{v}): \mathbf{v} \in S \} \text{ is linearly independent.}\]</blockquote><h2 id="linear-transformations-and-bases">Linear transformations and bases</h2><p>A key feature of linear transformations is that their action is determined by their values on a basis.</p><blockquote class="prompt-info"><p><strong>Theorem 7</strong><br /> Let $\mathbb{V}, \mathbb{W}$ be $F$-vector spaces, let $\{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_n \}$ be a basis of $\mathbb{V}$, and let $\mathbf{w}_1, \mathbf{w}_2, \dots, \mathbf{w}_n \in \mathbb{W}$. Then there exists a unique linear transformation $T: \mathbb{V} \to \mathbb{W}$ such that</p>\[T(\mathbf{v}_i) = \mathbf{w}_i \quad (i = 1, 2, \dots, n).\]<p><strong>Proof</strong><br /> For $\mathbf{x} \in \mathbb{V}$, the representation</p>\[\mathbf{x} = \sum_{i=1}^n a_i \mathbf{v}_i \text{ (}a_1, a_2, \dots, a_n \in F \text{)}\]<p>is unique. Define a linear transformation $T: \mathbb{V} \to \mathbb{W}$ by</p>\[T(\mathbf{x}) = T\left( \sum_{i=1}^n a_i \mathbf{v}_i \right) = \sum_{i=1}^n a_i \mathbf{w}_i.\]<p>i) For $i = 1, 2, \dots, n$, $T(\mathbf{v}_i) = \mathbf{w}_i$.</p><p>ii) Suppose another linear transformation $U: \mathbb{V} \to \mathbb{W}$ satisfies $U(\mathbf{v}_i) = \mathbf{w}_i$ for $i = 1, 2, \dots, n$. Then for $\mathbf{x} = \sum_{i=1}^n a_i \mathbf{v}_i \in \mathbb{V}$,</p>\[U(\mathbf{x}) = \sum_{i=1}^n a_i U(\mathbf{v}_i) = \sum_{i=1}^n a_i \mathbf{w}_i = T(\mathbf{x}_i)\] \[\therefore U = T.\]<p>From i) and ii), the linear transformation satisfying $T(\mathbf{v}_i) = \mathbf{w}_i$ for $i = 1, 2, \dots, n$ is unique and given by</p>\[T(\mathbf{x}) = T\left( \sum_{i=1}^n a_i \mathbf{v}_i \right) = \sum_{i=1}^n a_i \mathbf{w}_i. \ \blacksquare\]<p><strong>Corollary 7-1</strong><br /> Let $\mathbb{V}, \mathbb{W}$ be vector spaces and suppose $\mathbb{V}$ has a finite basis $\{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_n \}$. If two linear transformations $U, T: \mathbb{V} \to \mathbf{W}$ satisfy $U(\mathbf{v}_i) = T(\mathbf{v}_i)$ for $i = 1, 2, \dots, n$, then $U = T$.<br /> In other words, <u>if two linear transformations agree on a basis, they are equal.</u></p></blockquote>]]> </content> </entry> <entry><title xml:lang="en">Linear Dependence and Independence, Bases and Dimension</title><link href="https://www.yunseo.kim/posts/linear-dependence-and-independence-basis-and-dimension/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/linear-dependence-and-independence-basis-and-dimension/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/linear-dependence-and-independence-basis-and-dimension/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/linear-dependence-and-independence-basis-and-dimension/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/linear-dependence-and-independence-basis-and-dimension/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/linear-dependence-and-independence-basis-and-dimension/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/linear-dependence-and-independence-basis-and-dimension/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/linear-dependence-and-independence-basis-and-dimension/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/linear-dependence-and-independence-basis-and-dimension/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/linear-dependence-and-independence-basis-and-dimension/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-09-16T00:00:00+09:00</published> <updated>2025-10-25T21:21:39+09:00</updated> <id>https://www.yunseo.kim/posts/linear-dependence-and-independence-basis-and-dimension/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Linear Algebra" /> <summary xml:lang="en">A concise guide to linear dependence and independence, and to bases and dimension of vector spaces: definitions, key propositions, replacement theorem, and subspace dimension.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>A concise guide to linear dependence and independence, and to bases and dimension of vector spaces: definitions, key propositions, replacement theorem, and subspace dimension.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="prerequisites">Prerequisites</h2><ul><li><a href="/posts/vectors-and-linear-combinations/">Vectors and Linear Combinations</a><li><a href="/posts/vector-spaces-subspaces-and-matrices/">Vector Spaces, Subspaces, and Matrices</a></ul><h2 id="linear-dependence-and-linear-independence">Linear dependence and linear independence</h2><p>Given a <a href="/posts/vector-spaces-subspaces-and-matrices/#vector-spaces">vector space</a> $\mathbb{V}$ and a <a href="/posts/vector-spaces-subspaces-and-matrices/#subspaces">subspace</a> $\mathbb{W}$, suppose we wish to find a minimal finite subset $S$ that <a href="/posts/vectors-and-linear-combinations/#the-linear-combination-cmathbfv--dmathbfw">spans</a> $\mathbb{W}$.</p><p>Let $S = \{\mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3, \mathbf{u}_4 \}$ with $\mathrm{span}(S) = \mathbb{W}$. How can we decide whether there exists a proper subset of $S$ that still spans $\mathbb{W}$? This is equivalent to asking whether some vector in $S$ can be written as a <a href="/posts/vectors-and-linear-combinations/#linear-combinations-of-vectors">linear combination</a> of the others. For example, a necessary and sufficient condition for expressing $\mathbf{u}_4$ as a linear combination of the remaining three vectors is the existence of scalars $a_1, a_2, a_3$ satisfying</p>\[\mathbf{u}_4 = a_1\mathbf{u}_1 + a_2\mathbf{u}_2 + a_3\mathbf{u}_3\]<p>However, solving a new linear system each time for $\mathbf{u}_1$, $\mathbf{u}_2$, $\mathbf{u}_3$, $\mathbf{u}_4$ is tedious. Instead, consider</p>\[a_1\mathbf{u}_1 + a_2\mathbf{u}_2 + a_3\mathbf{u}_3 + a_4\mathbf{u}_4 = \mathbf{0}\]<p>If some vector in $S$ is a linear combination of the others, then there exists a representation of the zero vector as a linear combination of elements of $S$ in which at least one among $a_1, a_2, a_3, a_4$ is nonzero. The converse is also true: if there is a nontrivial linear combination of vectors in $S$ that equals the zero vector (i.e., at least one of $a_1, a_2, a_3, a_4$ is nonzero), then some vector in $S$ is a linear combination of the others.</p><p>Generalizing this, we define <strong>linear dependence</strong> and <strong>linear independence</strong> as follows.</p><blockquote class="prompt-info"><p><strong>Definition</strong><br /> For a subset $S$ of a vector space $\mathbb{V}$, if there exist finitely many distinct vectors $\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n \in S$ and scalars $a_1, a_2, \dots, a_n$, not all $0$, such that $a_1\mathbf{u}_1 + a_2\mathbf{u}_2 + \cdots + a_n\mathbf{u}_n = \mathbf{0}$, then the set $S$ (and those vectors) is called <strong>linearly dependent</strong>. Otherwise, it is called <strong>linearly independent</strong>.</p></blockquote><p>For any vectors $\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n$, if $a_1 = a_2 = \cdots = a_n = 0$ then $a_1\mathbf{u}_1 + a_2\mathbf{u}_2 + \cdots + a_n\mathbf{u}_n = \mathbf{0}$; this is called the <strong>trivial representation of the zero vector</strong>.</p><p>The following three propositions about linearly independent sets hold in every vector space. In particular, <strong>Proposition 3</strong> is very useful for testing whether a finite set is linearly independent.</p><blockquote class="prompt-info"><ul><li><strong>Proposition 1</strong>: The empty set is linearly independent. A set must be nonempty to be linearly dependent.<li><strong>Proposition 2</strong>: A set consisting of a single nonzero vector is linearly independent.<li><strong>Proposition 3</strong>: A set is linearly independent if and only if the only way to express $\mathbf{0}$ as a linear combination of its vectors is the trivial one.</ul></blockquote><p>The following theorems are also important.</p><blockquote class="prompt-info"><p><strong>Theorem 1</strong><br /> If $\mathbb{V}$ is a vector space and $S_1 \subseteq S_2 \subseteq \mathbb{V}$, then $S_2$ is linearly dependent whenever $S_1$ is linearly dependent.</p><p><strong>Corollary 1-1</strong><br /> If $\mathbb{V}$ is a vector space and $S_1 \subseteq S_2 \subseteq \mathbb{V}$, then $S_1$ is linearly independent whenever $S_2$ is linearly independent.</p></blockquote><blockquote class="prompt-info"><p><strong>Theorem 2</strong><br /> Let $\mathbb{V}$ be a vector space and $S$ a linearly independent subset. For a vector $\mathbf{v} \in \mathbb{V}\setminus S$, $S \cup \{\mathbf{v}\}$ is linearly dependent if and only if $\mathbf{v} \in \mathrm{span}(S)$.</p><p>In other words, <strong>if no proper subset of $S$ spans the same space as $S$, then $S$ is linearly independent.</strong></p></blockquote><h2 id="bases-and-dimension">Bases and dimension</h2><h3 id="basis">Basis</h3><p>A spanning set $S$ of $\mathbb{W}$ that is <a href="#linear-dependence-and-linear-independence">linearly independent</a> has a special property: every vector in $\mathbb{W}$ can be expressed as a linear combination of $S$, and that expression is unique (<strong>Theorem 3</strong>). Thus, we define a linearly independent spanning set of a vector space to be a <strong>basis</strong>.</p><blockquote class="prompt-info"><p><strong>Definition of a basis</strong><br /> For a vector space $\mathbb{V}$ and a subset $\beta$, if $\beta$ is linearly independent and spans $\mathbb{V}$, then $\beta$ is called a <strong>basis</strong> of $\mathbb{V}$. In this case, the vectors in $\beta$ are said to form a basis of $\mathbb{V}$.</p></blockquote><blockquote class="prompt-tip"><p>$\mathrm{span}(\emptyset) = \{\mathbf{0}\}$ and $\emptyset$ is linearly independent. Therefore, $\emptyset$ is a basis of the zero space.</p></blockquote><p>In particular, the following distinguished basis of $F^n$ is called the <strong>standard basis</strong> of $F^n$.</p><blockquote class="prompt-info"><p><strong>Definition of the standard basis</strong><br /> For the vector space $F^n$, consider</p>\[\mathbf{e}_1 = (1,0,0,\dots,0),\ \mathbf{e}_2 = (0,1,0,\dots,0),\ \dots, \mathbf{e}_n = (0,0,0,\dots,1)\]<p>Then the set $\{\mathbf{e}_1, \mathbf{e}_2, \dots, \mathbf{e}_n \}$ is a basis of $F^n$, called the <strong>standard basis</strong>.</p></blockquote><blockquote class="prompt-info"><p><strong>Theorem 3</strong><br /> Let $\mathbb{V}$ be a vector space and $\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n \in \mathbb{V}$ be distinct vectors. A necessary and sufficient condition for $\beta = \{\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n \}$ to be a basis of $\mathbb{V}$ is that every vector $\mathbf{v} \in \mathbb{V}$ can be expressed as a linear combination of vectors in $\beta$, and that this expression is unique. That is, there exist unique scalars $(a_1, a_2, \dots, a_n)$ such that</p>\[\mathbf{v} = a_1\mathbf{u}_1 + a_2\mathbf{u}_2 + \cdots + a_n\mathbf{u}_n\]</blockquote><p>By <strong>Theorem 3</strong>, if the distinct vectors $\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n$ form a basis of a vector space $\mathbb{V}$, then within $\mathbb{V}$, a vector $\mathbf{v}$ uniquely determines the scalar $n$-tuple $(a_1, a_2, \dots, a_n)$, and conversely a scalar $n$-tuple uniquely determines the corresponding vector $\mathbf{v}$. We will revisit this when studying <strong>invertibility</strong> and <strong>isomorphisms</strong>; in this case, $\mathbb{V}$ and $F^n$ are <u>essentially the same</u>.</p><blockquote class="prompt-info"><p><strong>Theorem 4</strong><br /> If $S$ is a finite set with $\mathrm{span}(S) = \mathbb{V}$, then some subset of $S$ is a basis of $\mathbb{V}$. In particular, in this case every basis of $\mathbb{V}$ is finite.</p></blockquote><blockquote><p>Many vector spaces fall under the scope of <strong>Theorem 4</strong>, but not all do. <u>A basis need not be finite</u>.{: .prompt-tip }</p></blockquote><h3 id="dimension">Dimension</h3><blockquote class="prompt-info"><p><strong>Theorem 5: Replacement theorem</strong><br /> Let $G$ be a set of $n$ vectors with $\mathrm{span}(G) = \mathbb{V}$. If $L$ is a subset of $\mathbb{V}$ consisting of $m$ linearly independent vectors, then $m \le n$. Moreover, there exists a set $H \subseteq G$ with $n-m$ vectors such that $\mathrm{span}(L \cup H) = \mathbb{V}$.</p></blockquote><p>From this we obtain two very important corollaries.</p><blockquote class="prompt-info"><p><strong>Corollary 5-1 of the replacement theorem</strong><br /> If a vector space $\mathbb{V}$ has a finite basis, then every basis of $\mathbb{V}$ is finite and all bases have the same number of vectors.</p></blockquote><p>Hence the number of vectors in a basis of $\mathbb{V}$ is an invariant, intrinsic property of $\mathbb{V}$, called its <strong>dimension</strong>.</p><blockquote class="prompt-info"><p><strong>Definition of dimension</strong><br /> A vector space that has a finite basis is called <strong>finite-dimensional</strong>; in this case, the number $n$ of basis elements is the <strong>dimension</strong> of the vector space, denoted $\dim(\mathbb{V})$. A vector space that is not finite-dimensional is called <strong>infinite-dimensional</strong>.</p></blockquote><blockquote class="prompt-tip"><ul><li>$\dim(\{\mathbf{0}\}) = 0$<li>$\dim(F^n) = n$<li>$\dim(\mathcal{M}_{m \times n}(F)) = mn$</ul></blockquote><blockquote class="prompt-tip"><p>The dimension of a vector space depends on the underlying field.</p><ul><li>Over the complex field $\mathbb{C}$, the complex numbers form a 1-dimensional vector space with basis $\{1\}$<li>Over the real field $\mathbb{R}$, the complex numbers form a 2-dimensional vector space with basis $\{1,i\}$</ul></blockquote><p>In a finite-dimensional vector space $\mathbb{V}$, any subset with more than $\dim(\mathbb{V})$ vectors can never be linearly independent.</p><blockquote class="prompt-info"><p><strong>Corollary 5-2 of the replacement theorem</strong><br /> Let $\mathbb{V}$ be a vector space of dimension $n$.</p><ol><li>Any finite spanning set of $\mathbb{V}$ has at least $n$ vectors, and any spanning set of $\mathbb{V}$ with exactly $n$ vectors is a basis.<li>Any linearly independent subset of $\mathbb{V}$ with exactly $n$ vectors is a basis of $\mathbb{V}$. 3. Any linearly independent subset of $\mathbb{V}$ can be extended to a basis. That is, if $L \subseteq \mathbb{V}$ is linearly independent, there exists a basis $\beta \supseteq L$ of $\mathbb{V}$.</ol></blockquote><h3 id="dimension-of-subspaces">Dimension of subspaces</h3><blockquote class="prompt-info"><p><strong>Theorem 6</strong><br /> In a finite-dimensional vector space $\mathbb{V}$, every subspace $\mathbb{W}$ is finite-dimensional and satisfies $\dim(\mathbb{W}) \le \dim(\mathbb{V})$. In particular, if $\dim(\mathbb{W}) = \dim(\mathbb{V})$, then $\mathbb{V} = \mathbb{W}$.</p><p><strong>Corollary 6-1</strong><br /> For a subspace $\mathbb{W}$ of a finite-dimensional vector space $\mathbb{V}$, any basis of $\mathbb{W}$ can be extended to a basis of $\mathbb{V}$.</p></blockquote><p>By <strong>Theorem 6</strong>, the dimension of a subspace of $\mathbb{R}^3$ can be $0,1,2,$ or $3$.</p><ul><li>0-dimensional: the zero space $\{\mathbf{0}\}$ containing only the origin ($\mathbf{0}$)<li>1-dimensional: a line through the origin ($\mathbf{0}$)<li>2-dimensional: a plane containing the origin ($\mathbf{0}$)<li>3-dimensional: the entire 3D Euclidean space</ul>]]> </content> </entry> <entry><title xml:lang="en">Vector Spaces, Subspaces, and Matrices</title><link href="https://www.yunseo.kim/posts/vector-spaces-subspaces-and-matrices/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/vector-spaces-subspaces-and-matrices/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/vector-spaces-subspaces-and-matrices/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/vector-spaces-subspaces-and-matrices/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/vector-spaces-subspaces-and-matrices/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/vector-spaces-subspaces-and-matrices/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/vector-spaces-subspaces-and-matrices/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/vector-spaces-subspaces-and-matrices/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/vector-spaces-subspaces-and-matrices/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/vector-spaces-subspaces-and-matrices/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-09-13T00:00:00+09:00</published> <updated>2025-10-28T18:44:54+09:00</updated> <id>https://www.yunseo.kim/posts/vector-spaces-subspaces-and-matrices/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Linear Algebra" /> <summary xml:lang="en">Define vector spaces and subspaces with canonical examples (R^n, matrix, and function spaces). Focus on matrix spaces: symmetric/skew, triangular, and diagonal subspaces.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Define vector spaces and subspaces with canonical examples (R^n, matrix, and function spaces). Focus on matrix spaces: symmetric/skew, triangular, and diagonal subspaces.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="tldr">TL;DR</h2><blockquote class="prompt-info"><ul><li><strong>Matrix</strong><ul><li>The entry of a matrix $A$ in the $i$-th row and $j$-th column is denoted $A_{ij}$ or $a_{ij}$<li><strong>Diagonal entry</strong>: an entry $a_{ij}$ with $i=j$<li>The components $a_{i1}, a_{i2}, \dots, a_{in}$ are the $i$-th <strong>row</strong> of the matrix<ul><li>Each row of a matrix can be regarded as a vector in $F^n$<li>Moreover, a row vector in $F^n$ can be viewed as another matrix of size $1 \times n$</ul><li>The components $a_{1j}, a_{2j}, \dots, a_{mj}$ are the $j$-th <strong>column</strong> of the matrix<ul><li>Each column of a matrix can be regarded as a vector in $F^m$<li>Moreover, a column vector in $F^m$ can be viewed as another matrix of size $m \times 1$</ul><li><strong>Zero matrix</strong>: a matrix all of whose entries are $0$, denoted by $O$<li><strong>Square matrix</strong>: a matrix with the same number of rows and columns<li>For two $m \times n$ matrices $A, B$, if $A_{ij} = B_{ij}$ for all $1 \leq i \leq m$, $1 \leq j \leq n$ (i.e., every corresponding entry agrees), then the two matrices are defined to be <strong>equal</strong> ($A=B$)<li><strong>Transpose (transpose matrix)</strong>: for an $m \times n$ matrix $A$, the $n \times m$ matrix $A^T$ obtained by swapping rows and columns of $A$<li><strong>Symmetric matrix</strong>: a square matrix $A$ with $A^T = A$<li><strong>Skew-symmetric matrix</strong>: a square matrix $B$ with $B^T = -B$<li><strong>Triangular matrix</strong><ul><li><strong>Upper triangular matrix</strong>: a matrix whose entries below the diagonal are all $0$ (i.e., $i&gt;j \Rightarrow A_{ij}=0$), usually denoted by $U$<li><strong>Lower triangular matrix</strong>: a matrix whose entries above the diagonal are all $0$ (i.e., $i&lt;j \Rightarrow A_{ij}=0$), usually denoted by $L$</ul><li><strong>Diagonal matrix</strong>: an $n \times n$ square matrix whose off-diagonal entries are all $0$ (i.e., $i \neq j \Rightarrow M_{ij}=0$), usually denoted by $D$</ul><li>Representative vector spaces<ul><li><strong>The $n$-tuples $F^n$</strong>:<ul><li>The set of all $n$-tuples with entries in a field $F$<li>Denoted $F^n$; an $F$-vector space</ul><li><strong>Matrix space</strong>:<ul><li>The set of all $m \times n$ matrices with entries in a field $F$<li>Denoted $\mathcal{M}_{m \times n}(F)$; a vector space</ul><li><strong>Function space</strong>:<ul><li>For a nonempty set $S$ over a field $F$, the set of all functions from $S$ to $F$<li>Denoted $\mathcal{F}(S,F)$; a vector space</ul></ul><li><strong>Subspace</strong><ul><li>A subset $\mathbb{W}$ of an $F$-vector space $\mathbb{V}$ is called a <strong>subspace</strong> of $\mathbb{V}$ if it is an $F$-vector space under the same addition and scalar multiplication as defined on $\mathbb{V}$<li>For every vector space $\mathbb{V}$, both $\mathbb{V}$ itself and $\{0\}$ are subspaces; in particular, $\{0\}$ is called the <strong>zero subspace</strong><li>If a subset of a vector space contains the zero vector and is closed under <a href="/posts/vectors-and-linear-combinations/#linear-combinations-of-vectors">linear combinations</a> (i.e., if $\mathrm{span}(\mathbb{W})=\mathbb{W}$), then it is a subspace</ul></ul></blockquote><h2 id="prerequisites">Prerequisites</h2><ul><li><a href="/posts/vectors-and-linear-combinations/">Vectors and Linear Combinations</a></ul><h2 id="vector-spaces">Vector spaces</h2><p>As briefly noted in <a href="/posts/vectors-and-linear-combinations/#vector-in-the-broad-sense-an-element-of-a-vector-space">Vectors and Linear Combinations</a>, the definitions of vectors and vector spaces as algebraic structures are as follows.</p><blockquote class="prompt-info"><p><strong>Definition</strong><br /> A <strong>vector space</strong> (or <strong>linear space</strong>) $\mathbb{V}$ over a field $F$ is a set equipped with two operations, <strong>sum</strong> and <strong>scalar multiplication</strong>, satisfying the following eight axioms. Elements of the field $F$ are called <strong>scalars</strong>, and elements of the vector space $\mathbb{V}$ are called <strong>vectors</strong>.</p><ul><li><strong>Sum</strong>: For $\mathbf{x}, \mathbf{y} \in \mathbb{V}$, there exists a unique element $\mathbf{x} + \mathbf{y} \in \mathbb{V}$. We call $\mathbf{x} + \mathbf{y}$ the <strong>sum</strong> of $\mathbf{x}$ and $\mathbf{y}$.<li><strong>Scalar multiplication</strong>: For $a \in F$ and $\mathbf{x} \in \mathbb{V}$, there exists a unique element $a\mathbf{x} \in \mathbb{V}$. We call $a\mathbf{x}$ a <strong>scalar multiple</strong> of $\mathbf{x}$.</ul><ol><li>For all $\mathbf{x},\mathbf{y} \in \mathbb{V}$, $\mathbf{x} + \mathbf{y} = \mathbf{y} + \mathbf{x}$. (commutativity of addition)<li>For all $\mathbf{x},\mathbf{y},\mathbf{z} \in \mathbb{V}$, $(\mathbf{x}+\mathbf{y})+\mathbf{z} = \mathbf{x}+(\mathbf{y}+\mathbf{z})$. (associativity of addition)<li>There exists $\mathbf{0} \in \mathbb{V}$ such that $\mathbf{x} + \mathbf{0} = \mathbf{x}$ for all $\mathbf{x} \in \mathbb{V}$. (zero vector, additive identity)<li>For each $\mathbf{x} \in \mathbb{V}$, there exists $\mathbf{y} \in \mathbb{V}$ such that $\mathbf{x}+\mathbf{y}=\mathbf{0}$. (additive inverse)<li>For each $\mathbf{x} \in \mathbb{V}$, $1\mathbf{x} = \mathbf{x}$. (multiplicative identity)<li>For all $a,b \in F$ and $\mathbf{x} \in \mathbb{V}$, $(ab)\mathbf{x} = a(b\mathbf{x})$. (associativity of scalar multiplication)<li>For all $a \in F$ and $\mathbf{x},\mathbf{y} \in \mathbb{V}$, $a(\mathbf{x}+\mathbf{y}) = a\mathbf{x} + a\mathbf{y}$. (distributivity of scalar multiplication over vector addition)<li>For all $a,b \in F$ and $\mathbf{x},\mathbf{y} \in \mathbb{V}$, $(a+b)\mathbf{x} = a\mathbf{x} + b\mathbf{x}$. (distributivity of scalar multiplication over field addition)</ol></blockquote><p>Strictly speaking, one should write “the $F$-vector space $\mathbb{V}$,” but when discussing vector spaces the specific field is often not essential; thus, when there is no risk of confusion, we omit $F$ and simply write “the vector space $\mathbb{V}$.”</p><h3 id="matrix-spaces">Matrix spaces</h3><h4 id="row-and-column-vectors">Row and column vectors</h4><p>The set of all $n$-tuples with entries in a field $F$ is denoted $F^n$. For $u = (a_1, a_2, \dots, a_n) \in F^n$ and $v = (b_1, b_2, \dots, b_n) \in F^n$, defining addition and scalar multiplication by</p>\[\begin{align*} u + v &amp;= (a_1+b_1, a_2+b_2, \dots, a_n+b_n), \\ cu &amp;= (ca_1, ca_2, \dots, ca_n) \end{align*}\]<p>makes $F^n$ into an $F$-vector space.</p><p>Vectors in $F^n$ are usually written as <strong>column vectors</strong> rather than standalone <strong>row vectors</strong> $(a_1, a_2, \dots, a_n)$:</p>\[\begin{pmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix}\]<blockquote class="prompt-tip"><p>Because column-vector notation takes more vertical space, one often uses the <a href="#transpose-symmetric-and-skew-symmetric-matrices">transpose</a> to write $(a_1, a_2, \dots, a_n)^T$ instead.</p></blockquote><h4 id="matrices-and-matrix-spaces">Matrices and matrix spaces</h4><p>An $m \times n$ <strong>matrix</strong> with entries in $F$ is a rectangular array, typically denoted by italic capitals ($A, B, C$, etc.):</p>\[\begin{pmatrix} a_{11} &amp; a_{12} &amp; \cdots &amp; a_{1n} \\ a_{21} &amp; a_{22} &amp; \cdots &amp; a_{2n} \\ \vdots &amp; \vdots &amp; &amp; \vdots \\ a_{m1} &amp; a_{m2} &amp; \cdots &amp; a_{mn} \end{pmatrix}\]<ul><li>The entry of a matrix $A$ in the $i$-th row and $j$-th column is denoted $A_{ij}$ or $a_{ij}$.<li>Each $a_{ij}$ ($1 \leq i \leq m$, $1 \leq j \leq n$) belongs to $F$.<li>An entry $a_{ij}$ with $i=j$ is called a <strong>diagonal entry</strong>.<li>The components $a_{i1}, a_{i2}, \dots, a_{in}$ form the $i$-th <strong>row</strong> of the matrix. Each row can be regarded as a vector in $F^n$, and, furthermore, a row vector in $F^n$ can be viewed as another matrix of size $1 \times n$.<li>The components $a_{1j}, a_{2j}, \dots, a_{mj}$ form the $j$-th <strong>column</strong> of the matrix. Each column can be regarded as a vector in $F^m$, and, furthermore, a column vector in $F^m$ can be viewed as another matrix of size $m \times 1$.<li>An $m \times n$ matrix whose entries are all $0$ is called the <strong>zero matrix</strong>, denoted $O$.<li>A matrix with the same number of rows and columns is called a <strong>square matrix</strong>.<li>For two $m \times n$ matrices $A, B$, if $A_{ij} = B_{ij}$ for all $1 \leq i \leq m$, $1 \leq j \leq n$ (i.e., every corresponding entry agrees), we define the matrices to be <strong>equal</strong> ($A=B$).</ul><p>The set of all $m \times n$ matrices with entries in $F$ is denoted $\mathcal{M}_{m \times n}(F)$. For $\mathbf{A},\mathbf{B} \in \mathcal{M}_{m \times n}(F)$ and $c \in F$, defining addition and scalar multiplication by</p>\[\begin{align*} (\mathbf{A}+\mathbf{B})_{ij} &amp;= \mathbf{A}_{ij} + \mathbf{B}_{ij}, \\ (c\mathbf{A})_{ij} &amp;= c\mathbf{A}_{ij} \\ \text{(for }1 \leq i \leq &amp;m, 1 \leq j \leq n \text{)} \end{align*}\]<p>makes $\mathcal{M}_{m \times n}(F)$ a vector space, called a <strong>matrix space</strong>.</p><p>This naturally extends the operations defined on $F^n$ and $F^m$.</p><h3 id="function-spaces">Function spaces</h3><p>For a nonempty set $S$ over a field $F$, $\mathcal{F}(S,F)$ denotes the set of all functions from $S$ to $F$. For $f,g \in \mathcal{F}(S,F)$, we declare $f$ and $g$ <strong>equal</strong> ($f=g$) if $f(s) = g(s)$ for all $s \in S$.</p><p>For $f,g \in \mathcal{F}(S,F)$, $c \in F$, and $s \in S$, defining addition and scalar multiplication by</p>\[\begin{align*} (f + g)(s) &amp;= f(s) + g(s), \\ (cf)(s) &amp;= c[f(s)] \end{align*}\]<p>makes $\mathcal{F}(S,F)$ a vector space, called a <strong>function space</strong>.</p><h2 id="subspaces">Subspaces</h2><blockquote class="prompt-info"><p><strong>Definition</strong><br /> A subset $\mathbb{W}$ of an $F$-vector space $\mathbb{V}$ is called a <strong>subspace</strong> of $\mathbb{V}$ if it is an $F$-vector space under the same addition and scalar multiplication as those defined on $\mathbb{V}$.</p></blockquote><p>For every vector space $\mathbb{V}$, both $\mathbb{V}$ itself and $\{0\}$ are subspaces; in particular, $\{0\}$ is called the <strong>zero subspace</strong>.</p><p>Whether a subset is a subspace can be checked using the following theorem.</p><blockquote class="prompt-info"><p><strong>Theorem 1</strong><br /> For a vector space $\mathbb{V}$ and a subset $\mathbb{W}$, $\mathbb{W}$ is a subspace of $\mathbb{V}$ if and only if the following three conditions hold (with the operations inherited from $\mathbb{V}$):</p><ol><li>$\mathbf{0} \in \mathbb{W}$<li>$\mathbf{x}+\mathbf{y} \in \mathbb{W} \quad \forall\ \mathbf{x} \in \mathbb{W},\ \mathbf{y} \in \mathbb{W}$<li>$c\mathbf{x} \in \mathbb{W} \quad \forall\ c \in F,\ \mathbf{x} \in \mathbb{W}$</ol><p>In short, if it contains the zero vector and is closed under <a href="/posts/vectors-and-linear-combinations/#linear-combinations-of-vectors">linear combinations</a> (i.e., if $\mathrm{span}(\mathbb{W})=\mathbb{W}$), then it is a subspace.</p></blockquote><p>The following theorems also hold.</p><blockquote class="prompt-info"><p><strong>Theorem 2</strong></p><ul><li><p>For any subset $S$ of a vector space $\mathbb{V}$, the span $\mathrm{span}(S)$ is a subspace of $\mathbb{V}$ containing $S$.</p>\[S \subset \mathrm{span}(S) \leq \mathbb{V} \quad \forall\ S \subset \mathbb{V}.\]<li><p>Any subspace of $\mathbb{V}$ that contains $S$ must contain the span of $S$.</p>\[\mathbb{W}\supset \mathrm{span}(S) \quad \forall\ S \subset \mathbb{W} \leq \mathbb{V}.\]</ul></blockquote><blockquote class="prompt-info"><p><strong>Theorem 3</strong><br /> For subspaces of a vector space $\mathbb{V}$, the intersection of any collection of such subspaces is again a subspace of $\mathbb{V}$.</p></blockquote><h3 id="transpose-symmetric-and-skew-symmetric-matrices">Transpose, symmetric, and skew-symmetric matrices</h3><p>The <strong>transpose</strong> $A^T$ of an $m \times n$ matrix $A$ is the $n \times m$ matrix obtained by swapping the rows and columns of $A$:</p>\[(A^T)_{ij} = A_{ji}\] \[\begin{pmatrix} 1 &amp; 2 &amp; 3 \\ 4 &amp; 5 &amp; 6 \end{pmatrix}^T = \begin{pmatrix} 1 &amp; 4 \\ 2 &amp; 5 \\ 3 &amp; 6 \end{pmatrix}\]<p>A matrix $A$ with $A^T = A$ is called <strong>symmetric</strong>, and a matrix $B$ with $B^T = -B$ is called <strong>skew-symmetric</strong>. Symmetric and skew-symmetric matrices must be square.</p><p>Let $\mathbb{W}_1$ and $\mathbb{W}_2$ be the sets of all symmetric and all skew-symmetric matrices in $\mathcal{M}_{n \times n}(F)$, respectively. Then $\mathbb{W}_1$ and $\mathbb{W}_2$ are subspaces of $\mathcal{M}_{n \times n}(F)$; that is, they are closed under addition and scalar multiplication.</p><h3 id="triangular-and-diagonal-matrices">Triangular and diagonal matrices</h3><p>These two classes of matrices are also particularly important.</p><p>First, we collectively call the following two types of matrices <strong>triangular matrices</strong>:</p><ul><li><strong>Upper triangular matrix</strong>: a matrix whose entries below the diagonal are all $0$ (i.e., $i&gt;j \Rightarrow A_{ij}=0$), usually denoted by $U$<li><strong>Lower triangular matrix</strong>: a matrix whose entries above the diagonal are all $0$ (i.e., $i&lt;j \Rightarrow A_{ij}=0$), usually denoted by $L$</ul><p>An $n \times n$ square matrix in which all off-diagonal entries are $0$—that is, $i \neq j \Rightarrow M_{ij}=0$—is called a <strong>diagonal matrix</strong>, usually denoted by $D$. A diagonal matrix is both upper and lower triangular.</p><p>The sets of upper triangular matrices, lower triangular matrices, and diagonal matrices are all subspaces of $\mathcal{M}_{m \times n}(F)$.</p>]]> </content> </entry> <entry><title xml:lang="en">Inner Product and Norm</title><link href="https://www.yunseo.kim/posts/inner-product-and-norm/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/inner-product-and-norm/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/inner-product-and-norm/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/inner-product-and-norm/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/inner-product-and-norm/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/inner-product-and-norm/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/inner-product-and-norm/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/inner-product-and-norm/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/inner-product-and-norm/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/inner-product-and-norm/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-09-10T00:00:00+09:00</published> <updated>2025-10-15T05:48:53+09:00</updated> <id>https://www.yunseo.kim/posts/inner-product-and-norm/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Linear Algebra" /> <summary xml:lang="en">Define the inner product and the dot product, derive vector length/norm from them, and see how to compute the angle between vectors in R^n and general inner product spaces.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Define the inner product and the dot product, derive vector length/norm from them, and see how to compute the angle between vectors in R^n and general inner product spaces.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="prerequisites">Prerequisites</h2><ul><li><a href="/posts/vectors-and-linear-combinations/">Vectors and Linear Combinations</a></ul><h2 id="inner-product">Inner Product</h2><p>In a general $F$-vector space, the definition of an <strong>inner product</strong> is as follows.</p><blockquote class="prompt-info"><p><strong>Definition of the inner product and inner product space</strong><br /> Consider an $F$-vector space $\mathbb{V}$. An <strong>inner product</strong> on $\mathbb{V}$, denoted $\langle \mathbf{x},\mathbf{y} \rangle$, is a function that assigns to each ordered pair of vectors $\mathbf{x}, \mathbf{y} \in \mathbb{V}$ a scalar in $F$ and satisfies the following:</p><p>For all $\mathbf{x},\mathbf{y},\mathbf{z} \in \mathbb{V}$ and all $c \in F$,</p><ol><li>$\langle \mathbf{x}+\mathbf{z}, \mathbf{y} \rangle = \langle \mathbf{x}, \mathbf{y} \rangle + \langle \mathbf{z}, \mathbf{y} \rangle$<li>$\langle c\mathbf{x}, \mathbf{y} \rangle = c \langle \mathbf{x}, \mathbf{y} \rangle$<li>$\overline{\langle \mathbf{x}, \mathbf{y} \rangle} = \langle \mathbf{y}, \mathbf{x} \rangle$ (where the overline denotes complex conjugation)<li>If $\mathbf{x} \neq \mathbf{0}$, then $\langle \mathbf{x}, \mathbf{x} \rangle$ is positive.</ol><p>An $F$-vector space $\mathbb{V}$ equipped with an inner product is called an <strong>inner product space</strong>. In particular, when $F=\mathbb{C}$ it is a <strong>complex inner product space</strong>, and when $F=\mathbb{R}$ it is a <strong>real inner product space</strong>.</p></blockquote><p>In particular, the following inner product is called the <strong>standard inner product</strong>. One can check that it satisfies all four axioms above.</p><blockquote class="prompt-info"><p><strong>Definition of the standard inner product</strong><br /> For two vectors in $F^n$, $\mathbf{x}=(a_1, a_2, \dots, a_n)$ and $\mathbf{y}=(b_1, b_2, \dots, b_n)$, the <strong>standard inner product</strong> on $F^n$ is defined by</p>\[\langle \mathbf{x}, \mathbf{y} \rangle = \sum_{i=1}^n a_i \overline{b_i}\]</blockquote><p>When $F=\mathbb{R}$, complex conjugation is trivial, so the standard inner product becomes $\sum_{i=1}^n a_i b_i$. In this special case we often write $\mathbf{x} \cdot \mathbf{y}$ instead of $\langle \mathbf{x}, \mathbf{y} \rangle$ and call it the <strong>dot product</strong> or <strong>scalar product</strong>.</p><blockquote class="prompt-info"><p><strong>Definition of the dot product/scalar product</strong><br /> For $\mathbf{v}=(v_1, v_2, \dots, v_n)$ and $\mathbf{w}=(w_1, w_2, \dots, w_n)$ in $\mathbb{R}^n$, the <strong>dot product</strong> (or <strong>scalar product</strong>) is defined by</p>\[\mathbf{v} \cdot \mathbf{w} = \sum_{i=1}^n v_i w_i = v_1 w_1 + v_2 w_2 + \cdots + v_n w_n\]</blockquote><blockquote class="prompt-warning"><p>The “scalar product” mentioned here is an operation between two vectors and is distinct from the operation between a scalar and a vector, “scalar multiplication,” discussed in <a href="/posts/vectors-and-linear-combinations/">Vectors and Linear Combinations</a>. The English terms are similar, and <a href="https://www.kms.or.kr/mathdict/list.html?key=kname&amp;keyword=%EC%8A%A4%EC%B9%BC%EB%9D%BC%EA%B3%B1">per the Korean Mathematical Society’s standard terminology the Korean translations are identical</a>, so be careful not to confuse them.</p><p>To avoid confusion, I will refer to it as the <strong>dot product</strong> whenever possible.</p></blockquote><blockquote class="prompt-tip"><p>In Euclidean space, the inner product coincides with the dot product, so when the context is clear, the dot product is often simply called the inner product. Strictly speaking, however, an inner product is a more general notion that includes the dot product as a special case.</p></blockquote><pre><code class="language-mermaid">flowchart TD
    A["Inner Product"] --&gt;|includes| B["Standard Inner Product"]
    B --&gt;|"when F = R (real numbers)"| C["Dot/Scalar Product"]

    %% inclusion notation
    C -. included in .-&gt; B
    B -. included in .-&gt; A
</code></pre><h2 id="lengthnorm-of-a-vector">Length/Norm of a Vector</h2><p>For a vector $\mathbf{v}=(v_1, v_2, \dots, v_n)$ in $\mathbb{R}^n$, the Euclidean length of $\mathbf{v}$ is defined via the dot product as</p>\[\| \mathbf{v} \| = \sqrt{\mathbf{v} \cdot \mathbf{v}} = \left[ \sum_{i=1}^n |v_i|^2 \right]^{1/2} = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}\]<p>More generally, in any inner product space, the <strong>length</strong> or <strong>norm</strong> of a vector is defined by</p>\[\| \mathbf{x} \| = \sqrt{\langle \mathbf{x}, \mathbf{x} \rangle}\]<p>In a general inner product space, the norm satisfies the following fundamental properties.</p><blockquote class="prompt-info"><p><strong>Theorem</strong><br /> Let $\mathbb{V}$ be an $F$-inner product space and let $\mathbf{x}, \mathbf{y} \in \mathbb{V}$ and $c \in F$. Then:</p><ol><li>$\|c\mathbf{x}\| = |c| \cdot \|\mathbf{x}\|$<li>The following hold:<ul><li>$\|\mathbf{x}\| = 0 \iff \mathbf{x}=\mathbf{0}$<li>$\|\mathbf{x}\| \geq 0 \ \forall \mathbf{x}$</ul><li><strong>Cauchy–Schwarz inequality</strong>: $| \langle \mathbf{x}, \mathbf{y} \rangle | \leq \|\mathbf{x}\| \cdot \|\mathbf{y}\|$ (with equality if and only if one of $\mathbf{x}$ and $\mathbf{y}$ is a scalar multiple of the other)<li><strong>Triangle inequality</strong>: $\| \mathbf{x} + \mathbf{y} \| \leq \|\mathbf{x}\| + \|\mathbf{y}\|$ (with equality if and only if one is a scalar multiple of the other and they point in the same direction)</ol></blockquote><h2 id="angle-between-vectors-and-unit-vectors">Angle Between Vectors and Unit Vectors</h2><p>A vector of length $1$ is called a <strong>unit vector</strong>. For two vectors $\mathbf{v}=(v_1, v_2, \dots, v_n)$ and $\mathbf{w}=(w_1, w_2, \dots, w_n)$ in $\mathbb{R}^n$, we have $\mathbf{v} \cdot \mathbf{w} = \|\mathbf{v}\| \cdot \|\mathbf{w}\| \cos\theta$, from which the angle $\theta$ between $\mathbf{v}$ and $\mathbf{w}$ ($0 \leq \theta \leq \pi$) can be obtained:</p>\[\theta = \arccos{\frac{\mathbf{v} \cdot \mathbf{w}}{\|\mathbf{v}\| \cdot \|\mathbf{w}\|}}\]<p>If $\mathbf{v} \cdot \mathbf{w} = 0$, the two vectors are said to be <strong>perpendicular</strong> or <strong>orthogonal</strong>.</p><blockquote class="prompt-tip"><p>If vectors $\mathbf{v}$ and $\mathbf{w}$ are perpendicular, then</p>\[\begin{align*} \| \mathbf{v} + \mathbf{w} \|^2 &amp;= (\mathbf{v} + \mathbf{w}) \cdot (\mathbf{v} + \mathbf{w}) \\ &amp;= \mathbf{v} \cdot \mathbf{v} + \mathbf{v} \cdot \mathbf{w} + \mathbf{w} \cdot \mathbf{v} + \mathbf{w} \cdot \mathbf{w} \\ &amp;= \mathbf{v} \cdot \mathbf{v} + \mathbf{w} \cdot \mathbf{w} \\ &amp;= \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2. \end{align*}\]</blockquote><p>Generalizing to an arbitrary inner product space:</p><blockquote class="prompt-info"><p><strong>Definition</strong><br /> Let $\mathbb{V}$ be an inner product space. For vectors $\mathbf{x}, \mathbf{y} \in \mathbb{V}$, if $\langle \mathbf{x}, \mathbf{y} \rangle = 0$, then $\mathbf{x}$ and $\mathbf{y}$ are said to be <strong>orthogonal</strong> or <strong>perpendicular</strong>. Moreover,</p><ol><li>For a subset $S \subset \mathbb{V}$, if any two distinct vectors in $S$ are orthogonal, then $S$ is called an <strong>orthogonal set</strong>.<li>A vector $\mathbf{x} \in \mathbb{V}$ with $\|\mathbf{x}\|=1$ is called a <strong>unit vector</strong>.<li>If a subset $S \subset \mathbb{V}$ is an orthogonal set consisting only of unit vectors, then $S$ is called an <strong>orthonormal set</strong>.</ol></blockquote><p>A set $S = { \mathbf{v}_1, \mathbf{v}_2, \dots }$ is orthonormal if and only if $\langle \mathbf{v}_i, \mathbf{v}_j \rangle = \delta_{ij}$. Multiplying a vector by a nonzero scalar does not affect orthogonality.</p><p>For any nonzero vector $\mathbf{x}$, the vector $\cfrac{\mathbf{x}}{\|\mathbf{x}\|}$ is a unit vector. Obtaining a unit vector by multiplying a nonzero vector by the reciprocal of its length is called <strong>normalizing</strong>.</p>]]> </content> </entry> <entry><title xml:lang="en">Vectors and Linear Combinations</title><link href="https://www.yunseo.kim/posts/vectors-and-linear-combinations/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/vectors-and-linear-combinations/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/vectors-and-linear-combinations/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/vectors-and-linear-combinations/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/vectors-and-linear-combinations/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/vectors-and-linear-combinations/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/vectors-and-linear-combinations/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/vectors-and-linear-combinations/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/vectors-and-linear-combinations/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/vectors-and-linear-combinations/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-09-07T00:00:00+09:00</published> <updated>2025-10-28T20:47:49+09:00</updated> <id>https://www.yunseo.kim/posts/vectors-and-linear-combinations/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Linear Algebra" /> <summary xml:lang="en">Learn what vectors are, how to represent them, and the basics of vector operations (addition, scalar multiplication). Build intuition for linear combinations and span.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Learn what vectors are, how to represent them, and the basics of vector operations (addition, scalar multiplication). Build intuition for linear combinations and span.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="tldr">TL;DR</h2><blockquote class="prompt-info"><ul><li><strong>Definition of a vector</strong><ul><li><strong>Vector in the narrow sense (Euclidean vector)</strong>: a physical quantity that has both magnitude and direction<li><strong>Vector in the broad, linear-algebraic sense</strong>: an element of a vector space</ul><li><strong>Ways to represent vectors</strong><ul><li><strong>Arrow representation</strong>: the vector’s magnitude is the length of the arrow, and its direction is the arrow’s direction. It is easy to visualize and intuitive, but it is difficult to represent higher-dimensional vectors (4D and above) or non-Euclidean vectors.<li><strong>Component representation</strong>: place the tail of the vector at the origin of a coordinate space and express the vector by the coordinates of its head.</ul><li><strong>Basic operations on vectors</strong><ul><li><strong>Sum</strong>: $(a_1, a_2, \cdots, a_n) + (b_1, b_2, \cdots, b_n) := (a_1+b_1, a_2+b_2, \cdots, a_n+b_n)$<li><strong>Scalar multiplication</strong>: $c(a_1, a_2, \cdots, a_n) := (ca_1, ca_2, \cdots, ca_n)$</ul><li><strong>Linear combination of vectors</strong><ul><li>For finitely many vectors $\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n$ and scalars $a_1, a_2, \dots, a_n$, a vector $\mathbf{v}$ satisfying $\mathbf{v} = a_1\mathbf{u}_1 + a_2\mathbf{u}_2 + \cdots + a_n\mathbf{u}_n$ is called a <strong>linear combination</strong> of $\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n$.<li>The numbers $a_1, a_2, \dots, a_n$ are called the <strong>coefficients</strong> of this linear combination.</ul><li><strong>Span</strong><ul><li>For a nonempty subset $S$ of a vector space $\mathbb{V}$, the set of all linear combinations formed from vectors in $S$, denoted $\mathrm{span}(S)$.<li>By definition, $\mathrm{span}(\emptyset) = \{0\}$.<li>For a subset $S$ of a vector space $\mathbb{V}$, if $\mathrm{span}(S) = \mathbb{V}$, then $S$ is said to generate (or span) $\mathbb{V}$.</ul></ul></blockquote><h2 id="prerequisites">Prerequisites</h2><ul><li>Coordinate plane/coordinate space<li>Field</ul><h2 id="what-is-a-vector">What is a vector?</h2><h3 id="vector-in-the-narrow-sense-euclidean-vector">Vector in the narrow sense: Euclidean vector</h3><blockquote class="prompt-info"><p>Many physical quantities such as force, velocity, and acceleration carry not only magnitude but also directional information. A physical quantity that has both magnitude and direction is called a <strong>vector</strong>.</p></blockquote><p>The definition above is the one used in mechanics in physics and in high-school-level mathematics. A vector in this geometric sense—“the magnitude and direction of a directed line segment,” grounded in physical intuition—is more precisely called a <strong>Euclidean vector</strong>.</p><h3 id="vector-in-the-broad-sense-an-element-of-a-vector-space">Vector in the broad sense: an element of a vector space</h3><p>In linear algebra, vectors are defined more broadly than Euclidean vectors, as an abstract algebraic structure:</p><blockquote class="prompt-info"><p><strong>Definition</strong><br /> A <strong>vector space</strong> (or <strong>linear space</strong>) $\mathbb{V}$ over a field $F$ is a set equipped with two operations, <strong>sum</strong> and <strong>scalar multiplication</strong>, satisfying the following eight axioms. Elements of the field $F$ are called <strong>scalars</strong>, and elements of the vector space $\mathbb{V}$ are called <strong>vectors</strong>.</p><ul><li><strong>Sum</strong>: For any $\mathbf{x}, \mathbf{y} \in \mathbb{V}$, there exists a unique element $\mathbf{x} + \mathbf{y} \in \mathbb{V}$. We call $\mathbf{x} + \mathbf{y}$ the <strong>sum</strong> of $\mathbf{x}$ and $\mathbf{y}$.<li><strong>Scalar multiplication</strong>: For any $a \in F$ and $\mathbf{x} \in \mathbb{V}$, there exists a unique element $a\mathbf{x} \in \mathbb{V}$. In this case, $a\mathbf{x}$ is called the <strong>scalar multiple</strong> of $\mathbf{x}$.</ul><ol><li>For all $\mathbf{x},\mathbf{y} \in \mathbb{V}$, $\mathbf{x} + \mathbf{y} = \mathbf{y} + \mathbf{x}$. (commutativity of addition)<li>For all $\mathbf{x},\mathbf{y},\mathbf{z} \in \mathbb{V}$, $(\mathbf{x}+\mathbf{y})+\mathbf{z} = \mathbf{x}+(\mathbf{y}+\mathbf{z})$. (associativity of addition)<li>There exists $\mathbf{0} \in \mathbb{V}$ such that $\mathbf{x} + \mathbf{0} = \mathbf{x}$ for all $\mathbf{x} \in \mathbb{V}$. (zero vector, additive identity)<li>For each $\mathbf{x} \in \mathbb{V}$, there exists $\mathbf{y} \in \mathbb{V}$ such that $\mathbf{x} + \mathbf{y} = \mathbf{0}$. (additive inverse)<li>For each $\mathbf{x} \in \mathbb{V}$, $1\mathbf{x} = \mathbf{x}$. (multiplicative identity)<li>For all $a,b \in F$ and $\mathbf{x} \in \mathbb{V}$, $(ab)\mathbf{x} = a(b\mathbf{x})$. (associativity of scalar multiplication)<li>For all $a \in F$ and $\mathbf{x},\mathbf{y} \in \mathbb{V}$, $a(\mathbf{x}+\mathbf{y}) = a\mathbf{x} + a\mathbf{y}$. (distributivity of scalar multiplication over vector addition)<li>For all $a,b \in F$ and $\mathbf{x} \in \mathbb{V}$, $(a+b)\mathbf{x} = a\mathbf{x} + b\mathbf{x}$. (distributivity of scalar multiplication over field addition)</ol></blockquote><p>This definition of a vector in linear algebra encompasses a broader class than the previously mentioned <a href="#vector-in-the-narrow-sense-euclidean-vector">Euclidean vector</a>. You can verify that <a href="#vector-in-the-narrow-sense-euclidean-vector">Euclidean vectors</a> satisfy these eight properties.</p><p>The origin and development of vectors are closely tied to practical problems in physics—such as describing force, motion, rotation, and fields quantitatively. The concept was first introduced as <a href="#vector-in-the-narrow-sense-euclidean-vector">Euclidean vectors</a> to meet the physical need to mathematically express natural phenomena. Mathematics then generalized and systematized these physical ideas, establishing formal structures such as vector spaces, inner products, and exterior products, leading to today’s definition of vectors. In other words, vectors are concepts demanded by physics and formalized by mathematics—an interdisciplinary product developed through close interaction between the two communities, rather than a creation of pure mathematics alone.</p><p>The <a href="#vector-in-the-narrow-sense-euclidean-vector">Euclidean vectors</a> handled in classical mechanics can be expressed within a <a href="#vector-in-the-broad-sense-an-element-of-a-vector-space">more general framework</a> mathematically. Modern physics actively uses not only <a href="#vector-in-the-narrow-sense-euclidean-vector">Euclidean vectors</a> but also more abstract notions defined in mathematics—vector spaces, function spaces, etc.—and attaches physical meaning to them. Hence it is inappropriate to regard the two definitions of a vector as merely “the physical definition” and “the mathematical definition.”</p><p>We will defer a deeper dive into vector spaces and, for now, focus on Euclidean vectors—vectors in the narrow sense that admit geometric representation in coordinate spaces. Building intuition with Euclidean vectors first will be helpful when generalizing to other kinds of vectors later.</p><h2 id="ways-to-represent-vectors">Ways to represent vectors</h2><h3 id="arrow-representation">Arrow representation</h3><p>This is the most common and most geometrically intuitive representation. The vector’s magnitude is represented by the length of an arrow, and its direction by the direction of the arrow.</p><p><img src="https://upload.wikimedia.org/wikipedia/commons/9/95/Vector_from_A_to_B.svg" alt="Euclidean Vector from A to B" width="972" /></p><blockquote><p><em>Image credits</em></p><ul><li>Author: Wikipedia user <a href="https://en.wikipedia.org/wiki/User:Nguyenthephuc">Nguyenthephuc</a><li>License: <a href="https://creativecommons.org/licenses/by-sa/3.0/deed.en">CC BY-SA 3.0</a></ul></blockquote><p>While intuitive, this arrow representation has clear limitations for higher-dimensional vectors (4D and above). Moreover, we will eventually need to handle non-Euclidean vectors that are not easily depicted geometrically, so it is important to become comfortable with the component representation described next.</p><h3 id="component-representation">Component representation</h3><p>Regardless of where a vector is located, if its magnitude and direction are the same, we consider it the same vector. Therefore, given a coordinate space, if we fix the tail of the vector at the origin of that coordinate space, then <u>an $n$-dimensional vector corresponds to an arbitrary point in $n$-dimensional space</u>, and we can represent the vector by the coordinates of its head. This is called the <strong>component representation</strong> of a vector.</p>\[(a_1, a_2, \cdots, a_n) \in \mathbb{R}^n \text{ or } \mathbb{C}^n\]<p><img src="https://upload.wikimedia.org/wikipedia/commons/5/5d/Position_vector.svg" alt="Position vector" /></p><blockquote><p><em>Image credits</em></p><ul><li>Author: Wikimedia user <a href="https://commons.wikimedia.org/wiki/User:Acdx">Acdx</a><li>License: <a href="https://creativecommons.org/licenses/by-sa/3.0/deed.en">CC BY-SA 3.0</a></ul></blockquote><h2 id="basic-operations-on-vectors">Basic operations on vectors</h2><p>The two basic operations on vectors are <strong>sum</strong> and <strong>scalar multiplication</strong>. Every vector operation can be expressed as a combination of these two.</p><h3 id="vector-addition">Vector addition</h3><p>The sum of two vectors is again a vector; its components are obtained by adding the corresponding components of the two vectors.</p>\[(a_1, a_2, \cdots, a_n) + (b_1, b_2, \cdots, b_n) := (a_1+b_1, a_2+b_2, \cdots, a_n+b_n)\]<h3 id="scalar-multiplication-of-vectors">Scalar multiplication of vectors</h3><p>A vector can be scaled up or down by multiplying it by a scalar (a constant); the result is obtained by multiplying each component by that scalar.</p>\[c(a_1, a_2, \cdots, a_n) := (ca_1, ca_2, \cdots, ca_n)\]<p><img src="https://upload.wikimedia.org/wikipedia/commons/1/1b/Scalar_multiplication_of_vectors2.svg" alt="Scalar multiplication of vectors" /></p><blockquote><p><em>Image credits</em></p><ul><li>Author: Wikipedia user <a href="https://en.wikipedia.org/wiki/User:Silly_rabbit">Silly rabbit</a><li>License: <a href="https://creativecommons.org/licenses/by-sa/3.0/deed.en">CC BY-SA 3.0</a></ul></blockquote><h2 id="linear-combinations-of-vectors">Linear combinations of vectors</h2><p>Just as calculus starts from numbers $x$ and functions $f(x)$, linear algebra starts from vectors $\mathbf{v}, \mathbf{w}, \dots$ and their linear combinations $c\mathbf{v} + d\mathbf{w} + \cdots$. Every linear combination of vectors is built from the two basic operations above, <a href="#vector-addition">sum</a> and <a href="#scalar-multiplication-of-vectors">scalar multiplication</a>.</p><blockquote class="prompt-info"><p>Given finitely many vectors $\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n$ and scalars $a_1, a_2, \dots, a_n$, a vector $\mathbf{v}$ satisfying</p>\[\mathbf{v} = a_1\mathbf{u}_1 + a_2\mathbf{u}_2 + \cdots + a_n\mathbf{u}_n\]<p>is called a <strong>linear combination</strong> of $\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n$. The numbers $a_1, a_2, \dots, a_n$ are the <strong>coefficients</strong> of this linear combination.</p></blockquote><p>Why are linear combinations important? Consider the following situation: <strong>$n$ vectors in $m$-dimensional space form the $n$ columns of an $m \times n$ matrix.</strong></p>\[\begin{gather*} \mathbf{v}_1 = (a_{11}, a_{21}, \dots, a_{m1}), \\ \mathbf{v}_2 = (a_{12}, a_{22}, \dots, a_{m2}), \\ \vdots \\ \mathbf{v}_n = (a_{1n}, a_{2n}, \dots, a_{mn}) \\ \\ A = \Bigg[ \mathbf{v}_1 \quad \mathbf{v}_2 \quad \cdots \quad \mathbf{v}_n \Bigg] \end{gather*}\]<p>The key questions are:</p><ol><li><strong>Describe all possible linear combinations $Ax = x_1\mathbf{v}_1 + x_2\mathbf{v}_2 + \cdots + x_n\mathbf{v}_n$.</strong> What do they form?<li>Given a desired output vector $b$, <strong>find numbers $x_1, x_2, \dots, x_n$ such that $Ax = b$.</strong></ol><p>We will return to the second question later; for now, focus on the first. To simplify, consider the case of two nonzero 2D vectors ($m=2$, $n=2$).</p><h3 id="the-linear-combination-cmathbfv--dmathbfw">The linear combination $c\mathbf{v} + d\mathbf{w}$</h3><p>A vector $\mathbf{v}$ in 2D has two components. For any scalar $c$, <u>the vector $c\mathbf{v}$ traces an infinitely long line through the origin in the $xy$-plane, parallel to the original vector $\mathbf{v}$.</u></p><p>If the given second vector $\mathbf{w}$ is not on this line (i.e., $\mathbf{v}$ and $\mathbf{w}$ are not parallel), then $d\mathbf{w}$ traces another line. Combining these two lines, we see that <strong>the linear combination $c\mathbf{v} + d\mathbf{w}$ fills a single plane that includes the origin.</strong></p><p><img src="https://upload.wikimedia.org/wikipedia/commons/6/6f/Linjcomb.png" alt="Linear combinations of two vectors" /></p><blockquote><p><em>Image credits</em></p><ul><li>Author: Wikimedia user <a href="https://commons.wikimedia.org/wiki/User:Svjo">Svjo</a><li>License: <a href="https://creativecommons.org/licenses/by-sa/4.0/deed.en">CC BY-SA 4.0</a></ul></blockquote><h3 id="span">Span</h3><p>In this way, linear combinations of vectors form a vector space, a process called <strong>spanning</strong>.</p><blockquote class="prompt-info"><p><strong>Definition</strong><br /> For a nonempty subset $S$ of a vector space $\mathbb{V}$, the set of all linear combinations formed from vectors in $S$ is called the <strong>span</strong> of $S$ and is denoted by $\mathrm{span}(S)$. By definition, $\mathrm{span}(\emptyset) = \{0\}$.</p></blockquote><blockquote class="prompt-info"><p><strong>Definition</strong><br /> For a subset $S$ of a vector space $\mathbb{V}$, if $\mathrm{span}(S) = \mathbb{V}$, then $S$ is said to generate (or span) $\mathbb{V}$.</p></blockquote><p>Although we have not yet introduced concepts such as subspaces and bases, recalling this example will help you understand the concept of a vector space.</p>]]> </content> </entry> <entry><title xml:lang="en">Summary of Kaggle 'Pandas' Course (2) - Lessons 4–6</title><link href="https://www.yunseo.kim/posts/summary-of-kaggle-pandas-course-2/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/summary-of-kaggle-pandas-course-2/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/summary-of-kaggle-pandas-course-2/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/summary-of-kaggle-pandas-course-2/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/summary-of-kaggle-pandas-course-2/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/summary-of-kaggle-pandas-course-2/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/summary-of-kaggle-pandas-course-2/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/summary-of-kaggle-pandas-course-2/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/summary-of-kaggle-pandas-course-2/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/summary-of-kaggle-pandas-course-2/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-08-24T00:00:00+09:00</published> <updated>2025-08-24T00:00:00+09:00</updated> <id>https://www.yunseo.kim/posts/summary-of-kaggle-pandas-course-2/</id> <author> <name>Yunseo Kim</name> </author> <category term="AI & Data" /> <category term="Machine Learning" /> <summary xml:lang="en">Practical Pandas for data cleaning and wrangling: a concise summary of Kaggle’s free &apos;Pandas&apos; course with added notes. This part covers Lessons 4–6—grouping/sorting, data types &amp; missing values, renaming and combining.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Practical Pandas for data cleaning and wrangling: a concise summary of Kaggle’s free 'Pandas' course with added notes. This part covers Lessons 4–6—grouping/sorting, data types & missing values, renaming and combining.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><p>I summarize here what I studied through Kaggle’s <a href="https://www.kaggle.com/learn/pandas">Pandas</a> course.<br /> Since it’s fairly long, I split it into two parts.</p><ul><li><a href="/posts/summary-of-kaggle-pandas-course-1/">Part 1: Lessons 1–3</a><li>Part 2: Lessons 4–6 (this post)</ul><p><img src="/assets/img/kaggle-pandas/certificate.png" alt="Certificate of Completion" /></p><h2 id="lesson-4-grouping-and-sorting">Lesson 4. Grouping and Sorting</h2><p>Sometimes you need to categorize data and perform operations per group, or sort by specific criteria.</p><h3 id="group-wise-analysis">Group-wise analysis</h3><p>Using the <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html"><code class="language-plaintext highlighter-rouge">groupby()</code></a> method, you can group rows sharing the same values in a given column and then compute summaries or apply operations per group.</p><p>Previously, we saw the <a href="/posts/summary-of-kaggle-pandas-course-1/#quick-summaries"><code class="language-plaintext highlighter-rouge">value_counts()</code> method</a>. You can implement the same behavior with <code class="language-plaintext highlighter-rouge">groupby()</code> as follows:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="nf">groupby</span><span class="p">(</span><span class="sh">'</span><span class="s">taster_name</span><span class="sh">'</span><span class="p">).</span><span class="nf">size</span><span class="p">()</span>
</pre></div></div><ol><li>Group the <code class="language-plaintext highlighter-rouge">reviews</code> DataFrame by identical values in the <code class="language-plaintext highlighter-rouge">taster_name</code> column<li>Return a Series of group sizes (number of rows in each group)</ol><p>Or:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="nf">groupby</span><span class="p">(</span><span class="sh">'</span><span class="s">taster_name</span><span class="sh">'</span><span class="p">).</span><span class="n">taster_name</span><span class="p">.</span><span class="nf">count</span><span class="p">()</span>
</pre></div></div><ol><li>Group the <code class="language-plaintext highlighter-rouge">reviews</code> DataFrame by identical values in the <code class="language-plaintext highlighter-rouge">taster_name</code> column<li>Within each group, select the <code class="language-plaintext highlighter-rouge">taster_name</code> column<li>Return a Series with the count of non-missing values</ol><p>In other words, the <code class="language-plaintext highlighter-rouge">value_counts()</code> method is essentially shorthand for the behavior above. Beyond <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.count.html"><code class="language-plaintext highlighter-rouge">count()</code></a>, you can use any summary function similarly. For instance, to find the minimum price per score in the wine data:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="nf">groupby</span><span class="p">(</span><span class="sh">'</span><span class="s">points</span><span class="sh">'</span><span class="p">).</span><span class="n">price</span><span class="p">.</span><span class="nf">min</span><span class="p">()</span>
</pre></div></div><div class="language-plaintext highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre><td class="rouge-code"><pre>points
80      5.0
81      5.0
       ... 
99     44.0
100    80.0
Name: price, Length: 21, dtype: float64
</pre></div></div><ol><li>Group the <code class="language-plaintext highlighter-rouge">reviews</code> DataFrame by identical values in the <code class="language-plaintext highlighter-rouge">points</code> column<li>Within each group, select the <code class="language-plaintext highlighter-rouge">price</code> column<li>Return the minimum value per group as a Series</ol><p>You can also group by multiple columns. To select the highest-rated wine per country and province:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="nf">groupby</span><span class="p">([</span><span class="sh">'</span><span class="s">country</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">province</span><span class="sh">'</span><span class="p">]).</span><span class="nf">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">df</span><span class="p">:</span> <span class="n">df</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df</span><span class="p">.</span><span class="n">points</span><span class="p">.</span><span class="nf">idxmax</span><span class="p">()])</span>
</pre></div></div><p>Another DataFrameGroupBy method worth knowing is <a href="https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.agg.html"><code class="language-plaintext highlighter-rouge">agg()</code></a>. It lets you run multiple functions per group after grouping.</p><blockquote class="prompt-tip"><p>You can pass as the argument:</p><ul><li>a function<li>a string with the function name<li>a list of functions or function-name strings<li>a dictionary mapping axis labels to a function or list of functions to apply on that axis</ul><p>The function must be able to:</p><ul><li>accept a DataFrame as input, or<li>be a function acceptable to <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html"><code class="language-plaintext highlighter-rouge">DataFrame.apply()</code></a> <a href="/posts/summary-of-kaggle-pandas-course-1/#maps">as covered earlier</a>.</ul><p>This clarification isn’t in the original Kaggle course; I added it based on the official pandas docs.</p></blockquote><p>For example, compute per-country price statistics:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="nf">groupby</span><span class="p">([</span><span class="sh">'</span><span class="s">country</span><span class="sh">'</span><span class="p">]).</span><span class="n">price</span><span class="p">.</span><span class="nf">agg</span><span class="p">([</span><span class="nb">len</span><span class="p">,</span> <span class="nb">min</span><span class="p">,</span> <span class="nb">max</span><span class="p">])</span>
</pre></div></div><blockquote class="prompt-tip"><p>Here <code class="language-plaintext highlighter-rouge">len</code> refers to Python’s built-in <a href="https://docs.python.org/3/library/functions.html#len"><code class="language-plaintext highlighter-rouge">len()</code></a>. In this example it reports the number of price (<code class="language-plaintext highlighter-rouge">price</code>) entries per group (<code class="language-plaintext highlighter-rouge">country</code>), <u>including missing values</u>. Since it accepts a DataFrame or Series as input, it can be used this way.</p><p>In contrast, pandas’ <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.count.html"><code class="language-plaintext highlighter-rouge">count()</code></a> returns the count of <u>non-missing values only</u>.</p><p>This note isn’t in the original Kaggle course; I added it based on the official Python and pandas documentation.</p></blockquote><h3 id="multiindex">MultiIndex</h3><p>When you perform groupby-based transformations and analyses, you’ll sometimes get a DataFrame with a MultiIndex composed of more than one level.</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre><td class="rouge-code"><pre><span class="n">countries_reviewed</span> <span class="o">=</span> <span class="n">reviews</span><span class="p">.</span><span class="nf">groupby</span><span class="p">([</span><span class="sh">'</span><span class="s">country</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">province</span><span class="sh">'</span><span class="p">]).</span><span class="n">description</span><span class="p">.</span><span class="nf">agg</span><span class="p">([</span><span class="nb">len</span><span class="p">])</span>
<span class="n">countries_reviewed</span>
</pre></div></div><table><tr><th><th><th>len<tr><th>Country<th>province<th><tr><td rowspan="2">Argentina<td>Mendoza Province<td>3264<tr><td>Other<td>536<tr><td>...<td>...<td>...<tr><td rowspan="2">Uruguay<td>San Jose<td>3<tr><td>Uruguay<td>24</table><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre><td class="rouge-code"><pre><span class="n">mi</span> <span class="o">=</span> <span class="n">countries_reviewed</span><span class="p">.</span><span class="n">index</span>
<span class="nf">type</span><span class="p">(</span><span class="n">mi</span><span class="p">)</span>
</pre></div></div><div class="language-plaintext highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre>pandas.core.indexes.multi.MultiIndex
</pre></div></div><p>A MultiIndex provides methods not present on a simple Index to handle hierarchical structures. For detailed usage and guidelines, see the <a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html">MultiIndex / advanced indexing section of the pandas User Guide</a>.</p><p>That said, the method you’ll likely use most often is <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html"><code class="language-plaintext highlighter-rouge">reset_index()</code></a> to flatten back to a regular Index:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">countries_reviewed</span><span class="p">.</span><span class="nf">reset_index</span><span class="p">()</span>
</pre></div></div><table><thead><tr><th> <th>country<th>province<th>len<tbody><tr><td>0<td>Argentina<td>Mendoza Province<td>3264<tr><td>1<td>Argentina<td>Other<td>536<tr><td>…<td>…<td>…<td>…<tr><td>423<td>Uruguay<td>San Jose<td>3<tr><td>424<td>Uruguay<td>Uruguay<td>24</table><h3 id="sorting">Sorting</h3><p>Looking at <code class="language-plaintext highlighter-rouge">countries_reviewed</code>, you’ll notice grouped results are returned in index order. That is, the row order of a <code class="language-plaintext highlighter-rouge">groupby</code> result is determined by index values, not by data content.</p><p>When needed, you can sort explicitly using <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html"><code class="language-plaintext highlighter-rouge">sort_values()</code></a>. For example, to sort country–province pairs in ascending order by the number of entries (‘len’):</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre><td class="rouge-code"><pre><span class="n">countries_reviewed</span> <span class="o">=</span> <span class="n">countries_reviewed</span><span class="p">.</span><span class="nf">reset_index</span><span class="p">()</span>
<span class="n">countries_reviewed</span><span class="p">.</span><span class="nf">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="sh">'</span><span class="s">len</span><span class="sh">'</span><span class="p">)</span>
</pre></div></div><table><thead><tr><th> <th>country<th>province<th>len<tbody><tr><td>179<td>Greece<td>Muscat of Kefallonian<td>1<tr><td>192<td>Greece<td>Sterea Ellada<td>1<tr><td>…<td>…<td>…<td>…<tr><td>415<td>US<td>Washington<td>8639<tr><td>392<td>US<td>California<td>36247</table><p><code class="language-plaintext highlighter-rouge">sort_values()</code> sorts ascending by default (low to high), but you can sort descending (high to low) by specifying:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">countries_reviewed</span><span class="p">.</span><span class="nf">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="sh">'</span><span class="s">len</span><span class="sh">'</span><span class="p">,</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</pre></div></div><table><thead><tr><th> <th>country<th>province<th>len<tbody><tr><td>392<td>US<td>California<td>36247<tr><td>415<td>US<td>Washington<td>8639<tr><td>…<td>…<td>…<td>…<tr><td>63<td>Chile<td>Coelemu<td>1<tr><td>149<td>Greece<td>Beotia<td>1</table><p>To sort by index instead, use <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html"><code class="language-plaintext highlighter-rouge">sort_index()</code></a>. It accepts the same parameters and has the same default order (descending) as <code class="language-plaintext highlighter-rouge">sort_values()</code>.</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">countries_reviewed</span><span class="p">.</span><span class="nf">sort_index</span><span class="p">()</span>
</pre></div></div><table><thead><tr><th> <th>country<th>province<th>len<tbody><tr><td>0<td>Argentina<td>Mendoza Province<td>3264<tr><td>1<td>Argentina<td>Other<td>536<tr><td>…<td>…<td>…<td>…<tr><td>423<td>Uruguay<td>San Jose<td>3<tr><td>424<td>Uruguay<td>Uruguay<td>24</table><p>Lastly, you can sort by multiple columns at once:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">countries_reviewed</span><span class="p">.</span><span class="nf">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="p">[</span><span class="sh">'</span><span class="s">country</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">len</span><span class="sh">'</span><span class="p">])</span>
</pre></div></div><h2 id="lesson-5-data-types-and-missing-values">Lesson 5. Data Types and Missing Values</h2><p>In practice, data rarely comes perfectly clean. More often than not, column types aren’t what you want and need conversion, and missing values appear throughout and must be handled carefully. For most data workflows, this stage is the biggest hurdle.</p><h3 id="data-types">Data types</h3><p>The data type of a DataFrame column or a Series is its <strong>dtype</strong>. Use the <code class="language-plaintext highlighter-rouge">dtype</code> attribute to check the type of a specific column. For example, to inspect the dtype of the <code class="language-plaintext highlighter-rouge">price</code> column in <code class="language-plaintext highlighter-rouge">reviews</code>:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="n">price</span><span class="p">.</span><span class="n">dtype</span>
</pre></div></div><div class="language-plaintext highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre>dtype('float64')
</pre></div></div><p>Or use the <code class="language-plaintext highlighter-rouge">dtypes</code> attribute to inspect all column dtypes at once:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="n">dtypes</span>
</pre></div></div><div class="language-plaintext highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre><td class="rouge-code"><pre>country        object
description    object
                ...  
variety        object
winery         object
Length: 13, dtype: object
</pre></div></div><p>A dtype reflects how pandas stores data internally. For instance, <code class="language-plaintext highlighter-rouge">float64</code> is a 64-bit floating-point number, and <code class="language-plaintext highlighter-rouge">int64</code> is a 64-bit integer.</p><p>One peculiarity: columns of pure strings don’t have a dedicated string type (in this context) and are treated as generic Python objects (<code class="language-plaintext highlighter-rouge">object</code>).</p><p>Use <a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.astype.html"><code class="language-plaintext highlighter-rouge">astype()</code></a> to convert a column from one type to another. For example, convert the <code class="language-plaintext highlighter-rouge">points</code> column from <code class="language-plaintext highlighter-rouge">int64</code> to <code class="language-plaintext highlighter-rouge">float64</code>:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="n">points</span><span class="p">.</span><span class="nf">astype</span><span class="p">(</span><span class="sh">'</span><span class="s">float64</span><span class="sh">'</span><span class="p">)</span>
</pre></div></div><div class="language-plaintext highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre><td class="rouge-code"><pre>0         87.0
1         87.0
          ... 
129969    90.0
129970    90.0
Name: points, Length: 129971, dtype: float64
</pre></div></div><p>A DataFrame (or Series) index also has a dtype:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="n">index</span><span class="p">.</span><span class="n">dtype</span>
</pre></div></div><div class="language-plaintext highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre>dtype('int64')
</pre></div></div><p>Pandas also supports “extension” dtypes such as categorical and various time-series types.</p><h3 id="missing-values">Missing values</h3><p>Empty entries are represented as <code class="language-plaintext highlighter-rouge">NaN</code> (short for “Not a Number”). For technical reasons, <code class="language-plaintext highlighter-rouge">NaN</code> is always of dtype <code class="language-plaintext highlighter-rouge">float64</code>.</p><p>Pandas provides helper functions for missing data. <a href="/posts/summary-of-kaggle-pandas-course-1/#conditional-selection">We briefly saw something similar before</a>: in addition to methods, pandas has standalone functions <a href="https://pandas.pydata.org/docs/reference/api/pandas.isna.html"><code class="language-plaintext highlighter-rouge">pd.isna</code></a> and <a href="https://pandas.pydata.org/docs/reference/api/pandas.notna.html"><code class="language-plaintext highlighter-rouge">pd.notna</code></a>. They return a single boolean or a boolean array indicating whether entries are missing (or not), and can be used like this:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">[</span><span class="n">pd</span><span class="p">.</span><span class="nf">isna</span><span class="p">(</span><span class="n">reviews</span><span class="p">.</span><span class="n">country</span><span class="p">)]</span>
</pre></div></div><p>Often you’ll want to detect missing values and then fill them with appropriate replacements. One strategy is to use <a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html"><code class="language-plaintext highlighter-rouge">fillna()</code></a> to replace <code class="language-plaintext highlighter-rouge">NaN</code>s with a chosen value. For example, replace all <code class="language-plaintext highlighter-rouge">NaN</code> in the <code class="language-plaintext highlighter-rouge">region_2</code> column with <code class="language-plaintext highlighter-rouge">"Unknown"</code>:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="n">region_2</span><span class="p">.</span><span class="nf">fillna</span><span class="p">(</span><span class="sh">"</span><span class="s">Unknown</span><span class="sh">"</span><span class="p">)</span>
</pre></div></div><p>Alternatively, you can use forward fill or backward fill to propagate the nearest valid value from above or below, via <a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.ffill.html"><code class="language-plaintext highlighter-rouge">ffill()</code></a> and <a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.bfill.html"><code class="language-plaintext highlighter-rouge">bfill()</code></a>, respectively.</p><blockquote class="prompt-danger"><p>Previously you could pass <code class="language-plaintext highlighter-rouge">'ffill'</code>/<code class="language-plaintext highlighter-rouge">'bfill'</code> to the <code class="language-plaintext highlighter-rouge">method</code> parameter of <code class="language-plaintext highlighter-rouge">fillna()</code>, but this became deprecated starting in pandas 2.1.0. Prefer <code class="language-plaintext highlighter-rouge">ffill()</code> or <code class="language-plaintext highlighter-rouge">bfill()</code> directly instead.</p></blockquote><p>Sometimes you need to replace a value with another even if it’s not missing. The original Kaggle course gives an example of a reviewer changing their Twitter handle. That’s a fine example, but here’s one that may feel more relatable to Korean readers:</p><p>Suppose South Korea split the northern part of Gyeonggi-do and established a new administrative region called <strong>Gyeonggibuk-do</strong>, and you have a dataset reflecting that change. Now imagine someone floated the harebrained idea of renaming <strong>Gyeonggibuk-do</strong> to <strong>Pyeonghwanuri Special Self-Governing Province</strong>, and actually managed to ram it through—a purely hypothetical scenario, of course. <del>It’s scary how close something like this might have come to happening.</del> You would then need to replace <code class="language-plaintext highlighter-rouge">"Gyeonggibuk-do"</code> with a new value like <code class="language-plaintext highlighter-rouge">"Pyeonghwanuri State"</code> or <code class="language-plaintext highlighter-rouge">"Pyeonghwanuri Special Self-Governing Province"</code> in the dataset. One way to do this in pandas is with <a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.replace.html"><code class="language-plaintext highlighter-rouge">replace()</code></a>:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">rok_2030_census</span><span class="p">.</span><span class="n">province</span><span class="p">.</span><span class="nf">replace</span><span class="p">(</span><span class="sh">"</span><span class="s">Gyeonggibuk-do</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">Pyeonghwanuri Special Self-Governing Province</span><span class="sh">"</span><span class="p">)</span>
</pre></div></div><p>With this snippet, you can effectively bulk-replace every <code class="language-plaintext highlighter-rouge">"Gyeonggibuk-do"</code> string in the <code class="language-plaintext highlighter-rouge">province</code> column of the <code class="language-plaintext highlighter-rouge">rok_2030_census</code> dataset with ‘that long one’. <del>It’s a relief no one actually had to run code like this in real life.</del></p><p>String replacement is also useful during cleaning, since missingness is often encoded as strings like <code class="language-plaintext highlighter-rouge">"Unknown"</code>, <code class="language-plaintext highlighter-rouge">"Undisclosed"</code>, or <code class="language-plaintext highlighter-rouge">"Invalid"</code> rather than <code class="language-plaintext highlighter-rouge">NaN</code>. In real-world workflows such as OCR-ing old official documents into datasets, this may be the norm rather than the exception.</p><h2 id="lesson-6-renaming-and-combining">Lesson 6. Renaming and Combining</h2><p>Sometimes you need to rename specific columns or index labels in a dataset. You’ll also frequently have to combine multiple DataFrames or Series.</p><h3 id="renaming">Renaming</h3><p>Use <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html"><code class="language-plaintext highlighter-rouge">rename()</code></a> to rename columns or index labels. It supports various input formats, but a Python dictionary is usually the most convenient. The following examples rename the <code class="language-plaintext highlighter-rouge">points</code> column to <code class="language-plaintext highlighter-rouge">score</code> and relabel index entries <code class="language-plaintext highlighter-rouge">0</code> and <code class="language-plaintext highlighter-rouge">1</code> to <code class="language-plaintext highlighter-rouge">firstEntry</code> and <code class="language-plaintext highlighter-rouge">secondEntry</code>:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="nf">rename</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="sh">'</span><span class="s">points</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">score</span><span class="sh">'</span><span class="p">})</span>
</pre></div></div><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="nf">rename</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">{</span><span class="mi">0</span><span class="p">:</span> <span class="sh">'</span><span class="s">firstEntry</span><span class="sh">'</span><span class="p">,</span> <span class="mi">1</span><span class="p">:</span> <span class="sh">'</span><span class="s">secondEntry</span><span class="sh">'</span><span class="p">})</span>
</pre></div></div><p>In practice, renaming columns is common, while renaming index values is rare; for that purpose, it’s usually more convenient to use <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html"><code class="language-plaintext highlighter-rouge">set_index()</code></a> <a href="/posts/summary-of-kaggle-pandas-course-1/#manipulating-the-index">as we saw earlier</a>.</p><p>Both the row and column axes have a <code class="language-plaintext highlighter-rouge">name</code> attribute. You can rename these axis names with <code class="language-plaintext highlighter-rouge">rename_axis()</code>. For example, label the row axis as <code class="language-plaintext highlighter-rouge">wines</code> and the column axis as <code class="language-plaintext highlighter-rouge">fields</code>:</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre><td class="rouge-code"><pre><span class="n">reviews</span><span class="p">.</span><span class="nf">rename_axis</span><span class="p">(</span><span class="sh">"</span><span class="s">wines</span><span class="sh">"</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="sh">'</span><span class="s">index</span><span class="sh">'</span><span class="p">).</span><span class="nf">rename_axis</span><span class="p">(</span><span class="sh">"</span><span class="s">fields</span><span class="sh">"</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="sh">'</span><span class="s">columns</span><span class="sh">'</span><span class="p">)</span>
</pre></div></div><h3 id="combining-datasets">Combining datasets</h3><p>You’ll often need to combine DataFrames or Series. Pandas provides three core tools for this, from simplest to most flexible: <a href="https://pandas.pydata.org/docs/reference/api/pandas.concat.html"><code class="language-plaintext highlighter-rouge">concat()</code></a>, <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html"><code class="language-plaintext highlighter-rouge">join()</code></a>, and <a href="https://pandas.pydata.org/docs/reference/api/pandas.merge.html"><code class="language-plaintext highlighter-rouge">merge()</code></a>. The Kaggle course focuses on the first two, noting that most <code class="language-plaintext highlighter-rouge">merge()</code> tasks can be done more simply with <code class="language-plaintext highlighter-rouge">join()</code>.</p><p><code class="language-plaintext highlighter-rouge">concat()</code> is the simplest: it stitches multiple DataFrames or Series along a given axis. It’s handy when the objects share the same fields (columns). By default, it concatenates along the index axis; specify <code class="language-plaintext highlighter-rouge">axis=1</code> or <code class="language-plaintext highlighter-rouge">axis='columns'</code> to concatenate along columns.</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre><td class="rouge-code"><pre><span class="o">&gt;&gt;&gt;</span> <span class="n">s1</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">Series</span><span class="p">([</span><span class="sh">'</span><span class="s">a</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">b</span><span class="sh">'</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">s2</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">Series</span><span class="p">([</span><span class="sh">'</span><span class="s">c</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">d</span><span class="sh">'</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">pd</span><span class="p">.</span><span class="nf">concat</span><span class="p">([</span><span class="n">s1</span><span class="p">,</span> <span class="n">s2</span><span class="p">])</span>
<span class="mi">0</span>    <span class="n">a</span>
<span class="mi">1</span>    <span class="n">b</span>
<span class="mi">0</span>    <span class="n">c</span>
<span class="mi">1</span>    <span class="n">d</span>
<span class="n">dtype</span><span class="p">:</span> <span class="nb">object</span>
</pre></div></div><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
</pre><td class="rouge-code"><pre><span class="o">&gt;&gt;&gt;</span> <span class="n">df1</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">([[</span><span class="sh">'</span><span class="s">a</span><span class="sh">'</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="sh">'</span><span class="s">b</span><span class="sh">'</span><span class="p">,</span> <span class="mi">2</span><span class="p">]],</span>
<span class="p">...</span>                    <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="sh">'</span><span class="s">letter</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">number</span><span class="sh">'</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">df1</span>
  <span class="n">letter</span>  <span class="n">number</span>
<span class="mi">0</span>      <span class="n">a</span>       <span class="mi">1</span>
<span class="mi">1</span>      <span class="n">b</span>       <span class="mi">2</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">df2</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">([[</span><span class="sh">'</span><span class="s">c</span><span class="sh">'</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="sh">'</span><span class="s">d</span><span class="sh">'</span><span class="p">,</span> <span class="mi">4</span><span class="p">]],</span>
<span class="p">...</span>                    <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="sh">'</span><span class="s">letter</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">number</span><span class="sh">'</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">df2</span>
  <span class="n">letter</span>  <span class="n">number</span>
<span class="mi">0</span>      <span class="n">c</span>       <span class="mi">3</span>
<span class="mi">1</span>      <span class="n">d</span>       <span class="mi">4</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">pd</span><span class="p">.</span><span class="nf">concat</span><span class="p">([</span><span class="n">df1</span><span class="p">,</span> <span class="n">df2</span><span class="p">])</span>
  <span class="n">letter</span>  <span class="n">number</span>
<span class="mi">0</span>      <span class="n">a</span>       <span class="mi">1</span>
<span class="mi">1</span>      <span class="n">b</span>       <span class="mi">2</span>
<span class="mi">0</span>      <span class="n">c</span>       <span class="mi">3</span>
<span class="mi">1</span>      <span class="n">d</span>       <span class="mi">4</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">df4</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">([[</span><span class="sh">'</span><span class="s">bird</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">polly</span><span class="sh">'</span><span class="p">],</span> <span class="p">[</span><span class="sh">'</span><span class="s">monkey</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">george</span><span class="sh">'</span><span class="p">]],</span>
<span class="p">...</span>                    <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="sh">'</span><span class="s">animal</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">name</span><span class="sh">'</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">df4</span>
   <span class="n">animal</span>    <span class="n">name</span>
<span class="mi">0</span>    <span class="n">bird</span>   <span class="n">polly</span>
<span class="mi">1</span>  <span class="n">monkey</span>  <span class="n">george</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">pd</span><span class="p">.</span><span class="nf">concat</span><span class="p">([</span><span class="n">df1</span><span class="p">,</span> <span class="n">df4</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
  <span class="n">letter</span>  <span class="n">number</span>  <span class="n">animal</span>    <span class="n">name</span>
<span class="mi">0</span>      <span class="n">a</span>       <span class="mi">1</span>    <span class="n">bird</span>   <span class="n">polly</span>
<span class="mi">1</span>      <span class="n">b</span>       <span class="mi">2</span>  <span class="n">monkey</span>  <span class="n">george</span>
</pre></div></div><blockquote class="prompt-tip"><p>According to the <a href="(https://pandas.pydata.org/docs/reference/api/pandas.concat.html)">pandas docs</a>, when building a DataFrame from many rows, avoid appending rows one by one in a loop. Instead, collect the rows in a list and perform a single <code class="language-plaintext highlighter-rouge">concat()</code>.</p></blockquote><p><code class="language-plaintext highlighter-rouge">join()</code> is more complex: it attaches another DataFrame to a base DataFrame by aligning on the index. If the two DataFrames have overlapping column names, you must specify <code class="language-plaintext highlighter-rouge">lsuffix</code> and <code class="language-plaintext highlighter-rouge">rsuffix</code> to disambiguate them.</p><div class="language-python highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre><td class="rouge-code"><pre><span class="o">&gt;&gt;&gt;</span> <span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">({</span><span class="sh">'</span><span class="s">key</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">K0</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">K1</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">K2</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">K3</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">K4</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">K5</span><span class="sh">'</span><span class="p">],</span>
<span class="p">...</span>                    <span class="sh">'</span><span class="s">A</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">A0</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">A1</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">A2</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">A3</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">A4</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">A5</span><span class="sh">'</span><span class="p">]})</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">df</span>
  <span class="n">key</span>   <span class="n">A</span>
<span class="mi">0</span>  <span class="n">K0</span>  <span class="n">A0</span>
<span class="mi">1</span>  <span class="n">K1</span>  <span class="n">A1</span>
<span class="mi">2</span>  <span class="n">K2</span>  <span class="n">A2</span>
<span class="mi">3</span>  <span class="n">K3</span>  <span class="n">A3</span>
<span class="mi">4</span>  <span class="n">K4</span>  <span class="n">A4</span>
<span class="mi">5</span>  <span class="n">K5</span>  <span class="n">A5</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">other</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">({</span><span class="sh">'</span><span class="s">key</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">K0</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">K1</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">K2</span><span class="sh">'</span><span class="p">],</span>
<span class="p">...</span>                       <span class="sh">'</span><span class="s">B</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">B0</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">B1</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">B2</span><span class="sh">'</span><span class="p">]})</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">other</span>
  <span class="n">key</span>   <span class="n">B</span>
<span class="mi">0</span>  <span class="n">K0</span>  <span class="n">B0</span>
<span class="mi">1</span>  <span class="n">K1</span>  <span class="n">B1</span>
<span class="mi">2</span>  <span class="n">K2</span>  <span class="n">B2</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">df</span><span class="p">.</span><span class="nf">join</span><span class="p">(</span><span class="n">other</span><span class="p">,</span> <span class="n">lsuffix</span><span class="o">=</span><span class="sh">'</span><span class="s">_caller</span><span class="sh">'</span><span class="p">,</span> <span class="n">rsuffix</span><span class="o">=</span><span class="sh">'</span><span class="s">_other</span><span class="sh">'</span><span class="p">)</span>
  <span class="n">key_caller</span>   <span class="n">A</span> <span class="n">key_other</span>    <span class="n">B</span>
<span class="mi">0</span>         <span class="n">K0</span>  <span class="n">A0</span>        <span class="n">K0</span>   <span class="n">B0</span>
<span class="mi">1</span>         <span class="n">K1</span>  <span class="n">A1</span>        <span class="n">K1</span>   <span class="n">B1</span>
<span class="mi">2</span>         <span class="n">K2</span>  <span class="n">A2</span>        <span class="n">K2</span>   <span class="n">B2</span>
<span class="mi">3</span>         <span class="n">K3</span>  <span class="n">A3</span>       <span class="n">NaN</span>  <span class="n">NaN</span>
<span class="mi">4</span>         <span class="n">K4</span>  <span class="n">A4</span>       <span class="n">NaN</span>  <span class="n">NaN</span>
<span class="mi">5</span>         <span class="n">K5</span>  <span class="n">A5</span>       <span class="n">NaN</span>  <span class="n">NaN</span>
</pre></div></div>]]> </content> </entry> <entry><title xml:lang="en">Web Performance Metrics (Web Vitals)</title><link href="https://www.yunseo.kim/posts/about-web-vitals/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/about-web-vitals/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/about-web-vitals/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/about-web-vitals/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/about-web-vitals/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/about-web-vitals/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/about-web-vitals/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/about-web-vitals/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/about-web-vitals/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/about-web-vitals/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-08-05T00:00:00+09:00</published> <updated>2025-08-28T18:22:07+09:00</updated> <id>https://www.yunseo.kim/posts/about-web-vitals/</id> <author> <name>Yunseo Kim</name> </author> <category term="Dev" /> <category term="Web Dev" /> <summary xml:lang="en">Overview of Web Vitals and Lighthouse scoring—what each metric means, how it’s measured, and target thresholds for LCP, INP, CLS, TBT, FCP, and Speed Index to improve performance.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Overview of Web Vitals and Lighthouse scoring—what each metric means, how it’s measured, and target thresholds for LCP, INP, CLS, TBT, FCP, and Speed Index to improve performance.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="factors-that-determine-web-performance">Factors that determine web performance</h2><p>Broadly, the factors that determine web performance to consider during optimization fall into two categories: loading performance and rendering performance.</p><h3 id="html-loading-performance">HTML loading performance</h3><ul><li>The time from the initial page request over the network to when the browser receives the HTML document and starts rendering<li>Determines how quickly the page starts to display<li>Optimize by minimizing redirects, caching HTML responses, compressing resources, and using an appropriate CDN</ul><h3 id="rendering-performance">Rendering performance</h3><ul><li>The time it takes the browser to paint what users see and make it interactive<li>Determines how smoothly and quickly the screen is drawn<li>Optimize by removing unnecessary CSS and JS, avoiding delayed loading of fonts and thumbnails, offloading heavy computations to a separate Web Worker to minimize main-thread occupancy, and optimizing animations</ul><h2 id="web-performance-metrics-web-vitals">Web Performance Metrics (Web Vitals)</h2><p>This post follows Google’s <a href="https://web.dev/performance?hl=en">web.dev</a> and the <a href="https://developer.chrome.com/docs/lighthouse/performance/performance-scoring?hl=en">Chrome Developers docs</a>. Unless there’s a special reason, aim for overall improvement rather than focusing on a single metric, and identify which part of the target page is the performance bottleneck. If you have real-user data, it’s better to focus on lower-quartile (Q1) values rather than the top or average, and verify that your targets are still met in those cases and improve accordingly.</p><h3 id="core-web-vitals">Core Web Vitals</h3><p>As we’ll cover shortly, there are many Web Vitals. Among them, Google highlights three metrics that are tightly tied to user experience and can be measured in the field rather than only in lab conditions; these are called the <a href="https://web.dev/articles/vitals?hl=en#core-web-vitals">Core Web Vitals</a>. Because Google incorporates Core Web Vitals into its search ranking, site owners should pay close attention to these for SEO.</p><ul><li><a href="#lcp-largest-contentful-paint">Largest Contentful Paint (LCP)</a>: reflects <em>loading performance</em>; should be within 2.5 s<li><a href="https://web.dev/articles/inp?hl=en">Interaction to Next Paint (INP)</a>: reflects <em>responsiveness</em>; should be ≤ 200 ms<li><a href="#cls-cumulative-layout-shift">Cumulative Layout Shift (CLS)</a>: reflects <em>visual stability</em>; should be ≤ 0.1</ul><p>Core Web Vitals are primarily field metrics, but the other two besides INP can also be measured in lab tools like Chrome DevTools or Lighthouse. INP requires actual user input, so it can’t be measured in a lab; in such cases, <a href="#tbt-total-blocking-time">TBT</a> is highly correlated with INP and serves as a close proxy, and <a href="https://web.dev/articles/vitals?hl=en#lab_tools_to_measure_core_web_vitals">improving TBT usually improves INP as well</a>.</p><h3 id="performance-score-weights-in-lighthouse-10">Performance score weights in Lighthouse 10</h3><p><a href="https://developer.chrome.com/docs/lighthouse/performance/performance-scoring?hl=en">The Lighthouse performance score is a weighted average of metric scores, using the following weights</a>.</p><table><thead><tr><th>Metric<th>Weight<tbody><tr><td><a href="#fcp-first-contentful-paint">First Contentful Paint</a><td>10%<tr><td><a href="#si-speed-index">Speed Index</a><td>10%<tr><td><a href="#lcp-largest-contentful-paint">Largest Contentful Paint</a><td>25%<tr><td><a href="#tbt-total-blocking-time">Total Blocking Time</a><td>30%<tr><td><a href="#cls-cumulative-layout-shift">Cumulative Layout Shift</a><td>25%</table><h3 id="fcp-first-contentful-paint">FCP (First Contentful Paint)</h3><ul><li>Measures the time from page request to the first render of DOM content<li>Counts images, non-white <code class="language-plaintext highlighter-rouge">&lt;canvas&gt;</code> elements, and SVG as DOM content; excludes content inside <code class="language-plaintext highlighter-rouge">iframe</code>s</ul><blockquote class="prompt-tip"><p>One factor that significantly affects FCP is font loading. For optimization tips, the <a href="https://developer.chrome.com/docs/lighthouse/performance/first-contentful-paint/?hl=en">Chrome Developers docs</a> recommend this <a href="https://developer.chrome.com/docs/lighthouse/performance/font-display?hl=en">related post</a>.</p></blockquote><h4 id="lighthouse-scoring-thresholds">Lighthouse scoring thresholds</h4><p>According to the <a href="https://developer.chrome.com/docs/lighthouse/performance/first-contentful-paint/?hl=en">Chrome Developers docs</a>, Lighthouse uses the following thresholds:</p><table><thead><tr><th>Color rating<th>Mobile FCP (s)<th>Desktop FCP (s)<tbody><tr><td>Green (fast)<td>0–1.8<td>0–0.9<tr><td>Orange (moderate)<td>1.8–3<td>0.9–1.6<tr><td>Red (slow)<td>&gt; 3<td>&gt; 1.6</table><h3 id="lcp-largest-contentful-paint">LCP (Largest Contentful Paint)</h3><ul><li>Measures the time it takes to render the largest element (image, text block, video, etc.) within the initial viewport when the page first opens<li>The larger the on-screen area it occupies, the more likely users will perceive it as primary content<li>If the LCP is an image, you can break the time down into four sub-intervals; identify where the bottleneck occurs:<ol><li>Time to First Byte (TTFB): time from the start of page load to receipt of the first byte of the HTML response<li>Load delay: the difference between when the browser starts loading the LCP resource and the TTFB<li>Load time: the time to load the LCP resource itself<li>Render delay: the time from finishing the LCP resource load until the LCP element is fully rendered</ol></ul><h4 id="lighthouse-scoring-thresholds-1">Lighthouse scoring thresholds</h4><p>According to the <a href="https://developer.chrome.com/docs/lighthouse/performance/lighthouse-largest-contentful-paint/?hl=en">Chrome Developers docs</a>, Lighthouse uses the following thresholds:</p><table><thead><tr><th>Color rating<th>Mobile LCP (s)<th>Desktop LCP (s)<tbody><tr><td>Green (fast)<td>0–2.5<td>0–1.2<tr><td>Orange (moderate)<td>2.5–4<td>1.2–2.4<tr><td>Red (slow)<td>&gt; 4<td>&gt; 2.4</table><h3 id="tbt-total-blocking-time">TBT (Total Blocking Time)</h3><ul><li>Measures the total time the page is unable to respond to user input such as mouse clicks, touches, and key presses<li>Among the tasks between FCP and <a href="https://developer.chrome.com/docs/lighthouse/performance/interactive?hl=en">TTI (Time to Interactive)</a>*, tasks that run for ≥ 50 ms are considered <a href="https://web.dev/articles/long-tasks-devtools?hl=en">long tasks</a>. For each long task, the time beyond 50 ms is called the <em>blocking portion</em>, and TBT is the sum of all blocking portions.</ul><blockquote class="prompt-info"><p>* TTI itself is overly sensitive to outliers in network responses and long tasks, leading to low consistency and high variance, <a href="https://developer.chrome.com/blog/lighthouse-10-0#scoring-changes">so it was removed from Lighthouse scoring starting with Lighthouse 10</a>.</p></blockquote><blockquote class="prompt-tip"><p>The most common causes of long tasks are unnecessary or inefficient JavaScript loading, parsing, and execution. The <a href="https://developer.chrome.com/docs/lighthouse/performance/lighthouse-total-blocking-time/?hl=en">Chrome Developers docs</a> and <a href="https://web.dev/articles/long-tasks-devtools#what_is_causing_my_long_tasks?hl=en">Google’s web.dev</a> recommend reducing JavaScript payload via <a href="https://web.dev/articles/reduce-javascript-payloads-with-code-splitting?hl=en">code splitting</a> so each chunk runs within 50 ms, and, if needed, offloading work from the main thread to a separate Service Worker to run in multiple threads.</p></blockquote><h4 id="lighthouse-scoring-thresholds-2">Lighthouse scoring thresholds</h4><p>According to the <a href="https://developer.chrome.com/docs/lighthouse/performance/lighthouse-total-blocking-time/?hl=en">Chrome Developers docs</a>, Lighthouse uses the following thresholds:</p><table><thead><tr><th>Color rating<th>Mobile TBT (ms)<th>Desktop TBT (ms)<tbody><tr><td>Green (fast)<td>0–200<td>0–150<tr><td>Orange (moderate)<td>200–600<td>150–350<tr><td>Red (slow)<td>&gt; 600<td>&gt; 350</table><h3 id="cls-cumulative-layout-shift">CLS (Cumulative Layout Shift)</h3><p> <video class="embed-video file" controls="" autoplay="" loop=""> <source src="https://web.dev/static/articles/cls/video/web-dev-assets/layout-instability-api/layout-instability2.webm" type="video/webm" /> Your browser does not support the video tag. Here is a <a href="https://web.dev/static/articles/cls/video/web-dev-assets/layout-instability-api/layout-instability2.webm">link to the video file</a> instead. </video> <em>An example of an unexpected layout shift</em></p><blockquote><p>Video source: <a href="https://web.dev/articles/cls?hl=en">Cumulative Layout Shift (CLS) | Articles | web.dev</a></p></blockquote><p><del>I sense deep rage in that cursor movement</del></p><ul><li>Unexpected layout shifts degrade UX in many ways, such as suddenly moving text that causes readers to lose their place, or misclicks on links and buttons<li>The exact method for calculating the CLS score is described on <a href="https://web.dev/articles/cls">Google’s web.dev</a><li>As shown in the image below, you should target ≤ 0.1</ul><p><img src="https://web.dev/static/articles/cls/image/good-cls-values.svg" alt="What is a good CLS score?" width="640" height="480" /></p><blockquote><p>Image source: <a href="https://web.dev/articles/cls#what-is-a-good-cls-score?hl=en">Cumulative Layout Shift (CLS) | Articles | web.dev</a></p></blockquote><h3 id="si-speed-index">SI (Speed Index)</h3><ul><li>Measures how quickly content is visually displayed during page load<li>Lighthouse records a video of the page loading in the browser, analyzes it to compute frame-by-frame progression, and then uses the <a href="https://github.com/paulirish/speedline">Speedline Node.js module</a> to compute the SI score</ul><blockquote class="prompt-tip"><p>Any improvement that speeds up page loading—including what we covered for <a href="#fcp-first-contentful-paint">FCP</a>, <a href="#lcp-largest-contentful-paint">LCP</a>, and <a href="#tbt-total-blocking-time">TBT</a>—will generally improve the SI score as well. Rather than representing a single stage of loading, SI reflects the overall loading process to some extent.</p></blockquote><h4 id="lighthouse-scoring-thresholds-3">Lighthouse scoring thresholds</h4><p>According to the <a href="https://developer.chrome.com/docs/lighthouse/performance/speed-index/?hl=en">Chrome Developers docs</a>, Lighthouse uses the following thresholds:</p><table><thead><tr><th>Color rating<th>Mobile SI (s)<th>Desktop SI (s)<tbody><tr><td>Green (fast)<td>0–3.4<td>0–1.3<tr><td>Orange (moderate)<td>3.4–5.8<td>1.3–2.3<tr><td>Red (slow)<td>&gt; 5.8<td>&gt; 2.3</table>]]> </content> </entry> <entry><title xml:lang="en">Gravitational Field and Gravitational Potential</title><link href="https://www.yunseo.kim/posts/gravitational-field-and-potential/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/gravitational-field-and-potential/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/gravitational-field-and-potential/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/gravitational-field-and-potential/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/gravitational-field-and-potential/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/gravitational-field-and-potential/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/gravitational-field-and-potential/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/gravitational-field-and-potential/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/gravitational-field-and-potential/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/gravitational-field-and-potential/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-05-17T00:00:00+09:00</published> <updated>2025-10-30T12:44:56+09:00</updated> <id>https://www.yunseo.kim/posts/gravitational-field-and-potential/</id> <author> <name>Yunseo Kim</name> </author> <category term="Physics" /> <category term="Classical Dynamics" /> <summary xml:lang="en">Learn about the definition of gravitational field vectors and gravitational potential according to Newton&apos;s law of universal gravitation, and examine two important related examples: the shell theorem and galactic rotation curves.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Learn about the definition of gravitational field vectors and gravitational potential according to Newton's law of universal gravitation, and examine two important related examples: the shell theorem and galactic rotation curves.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="tldr">TL;DR</h2><blockquote class="prompt-info"><ul><li>Newton’s law of universal gravitation: $\mathbf{F} = -G\cfrac{mM}{r^2}\mathbf{e}_r$<li>For objects with continuous mass distribution and finite size: $\mathbf{F} = -Gm\int_V \cfrac{dM}{r^2}\mathbf{e}_r = -Gm\int_V \cfrac{\rho(\mathbf{r^\prime})\mathbf{e}_r}{r^2} dv^{\prime}$<ul><li>$\rho(\mathbf{r^{\prime}})$: mass density at a point located at position vector $\mathbf{r^{\prime}}$ from an arbitrary origin<li>$dv^{\prime}$: volume element at a point located at position vector $\mathbf{r^{\prime}}$ from an arbitrary origin</ul><li><strong>Gravitational field vector</strong>:<ul><li>A vector representing the force per unit mass experienced by a particle in the field created by an object of mass $M$<li>$\mathbf{g} = \cfrac{\mathbf{F}}{m} = - G \cfrac{M}{r^2}\mathbf{e}_r = - G \int_V \cfrac{\rho(\mathbf{r^\prime})\mathbf{e}_r}{r^2}dv^\prime$<li>Has dimensions of <em>force per unit mass</em> or <em>acceleration</em></ul><li><strong>Gravitational potential</strong>:<ul><li>$\mathbf{g} \equiv -\nabla \Phi$<li>Has dimensions of (<em>force per unit mass</em>) × (<em>distance</em>) or <em>energy per unit mass</em><li>$\Phi = -G\cfrac{M}{r}$<li>Only the relative difference in gravitational potential has meaning; the specific value itself is meaningless<li>Usually the condition $\Phi \to 0$ as $r \to \infty$ is arbitrarily set to remove ambiguity<li>$U = m\Phi, \quad \mathbf{F} = -\nabla U$</ul><li><strong>Gravitational potential inside and outside a spherical shell (Shell theorem)</strong><ul><li>When $R&gt;a$:<ul><li>$\Phi(R&gt;a) = -\cfrac{GM}{R}$<li>When calculating the gravitational potential at any external point due to a spherically symmetric mass distribution, the object can be treated as a point mass</ul><li>When $R&lt;b$:<ul><li>$\Phi(R&lt;b) = -2\pi\rho G(a^2 - b^2)$<li>Inside a spherically symmetric mass shell, the gravitational potential is constant regardless of position, and the gravitational force is $0$</ul><li>When $b&lt;R&lt;a$: $\Phi(b&lt;R&lt;a) = -4\pi\rho G \left( \cfrac{a^2}{2} - \cfrac{b^3}{3R} - \cfrac{R^2}{6} \right)$</ul></ul></blockquote><h2 id="gravitational-field">Gravitational Field</h2><h3 id="newtons-law-of-universal-gravitation">Newton’s Law of Universal Gravitation</h3><p>Newton had already systematized and numerically verified the law of universal gravitation before 11666 HE. Nevertheless, it took another 20 years until he published his results in his book <em>Principia</em> in 11687 HE, because he could not justify the calculation method that assumed the Earth and Moon as point masses without size. Fortunately, <a href="#when-ra">using the calculus that Newton himself invented later, we can prove that problem, which was not easy for Newton in the 1600s, much more easily</a>.</p><p>According to Newton’s law of universal gravitation, <em>every mass particle attracts every other particle in the universe with a force that is proportional to the product of their masses and inversely proportional to the square of the distance between them.</em> Mathematically, this is expressed as:</p>\[\mathbf{F} = -G\frac{mM}{r^2}\mathbf{e}_r \label{eqn:law_of_gravitation}\tag{1}\]<p><img src="https://upload.wikimedia.org/wikipedia/commons/0/0e/NewtonsLawOfUniversalGravitation.svg" alt="Newton's law of universal gravitation" /></p><blockquote><p><em>Image source</em></p><ul><li>Author: Wikimedia user <a href="https://commons.wikimedia.org/wiki/User:Dna-webmaster">Dennis Nilsson</a><li>License: <a href="https://creativecommons.org/licenses/by/3.0/">CC BY 3.0</a></ul></blockquote><p>The unit vector $\mathbf{e}_r$ points from $M$ toward $m$, and the negative sign indicates that the force is attractive. That is, $m$ is pulled toward $M$.</p><h3 id="cavendishs-experiment">Cavendish’s Experiment</h3><p>The experimental verification of this law and the determination of the value of $G$ was accomplished by British physicist Henry Cavendish in 11798 HE. Cavendish’s experiment used a torsion balance consisting of two small spheres fixed to the ends of a light rod. These two spheres were each attracted toward two other large spheres positioned nearby. The currently accepted value of $G$ is $6.673 \pm 0.010 \times 10^{-11} \mathrm{N\cdot m^2/kg^2}$.</p><blockquote class="prompt-tip"><p>Despite $G$ being one of the oldest known fundamental constants, it is known with lower precision than most other fundamental constants such as $e$, $c$, and $\hbar$. Even today, much research is being conducted to determine the value of $G$ with higher precision.</p></blockquote><h3 id="for-objects-with-finite-size">For Objects with Finite Size</h3><p>The law in equation ($\ref{eqn:law_of_gravitation}$) can strictly only be applied to <em>point particles</em>. If one or both objects have finite size, we need the additional assumption that the gravitational force field is a <em>linear field</em> to calculate the force. That is, we assume that the total gravitational force on a particle of mass $m$ from several other particles can be found by vector addition of each force. For objects with continuous mass distribution, the sum is replaced by an integral:</p>\[\mathbf{F} = -Gm\int_V \frac{dM}{r^2}\mathbf{e}_r = -Gm\int_V \frac{\rho(\mathbf{r^\prime})\mathbf{e}_r}{r^2} dv^{\prime} \label{eqn:integral_form}\tag{2}\]<ul><li>$\rho(\mathbf{r^{\prime}})$: mass density at a point located at position vector $\mathbf{r^{\prime}}$ from an arbitrary origin<li>$dv^{\prime}$: volume element at a point located at position vector $\mathbf{r^{\prime}}$ from an arbitrary origin</ul><p>If both objects of mass $M$ and mass $m$ have finite size, a second volume integral over $m$ is also needed to find the total gravitational force.</p><h3 id="gravitational-field-vector">Gravitational Field Vector</h3><p>The <strong>gravitational field vector</strong> $\mathbf{g}$ is defined as the vector representing the force per unit mass experienced by a particle in the field created by an object of mass $M$:</p>\[\mathbf{g} = \frac{\mathbf{F}}{m} = - G \frac{M}{r^2}\mathbf{e}_r \label{eqn:g_vector}\tag{3}\]<p>or</p>\[\boxed{\mathbf{g} = - G \int_V \frac{\rho(\mathbf{r^\prime})\mathbf{e}_r}{r^2}dv^\prime} \tag{4}\]<p>Here, the direction of $\mathbf{e}_r$ varies with $\mathbf{r^\prime}$.</p><p>This quantity $\mathbf{g}$ has dimensions of <em>force per unit mass</em> or <em>acceleration</em>. The magnitude of the gravitational field vector $\mathbf{g}$ near the Earth’s surface is equal to what we call the <strong>gravitational acceleration constant</strong>, with $|\mathbf{g}| \approx 9.80\mathrm{m/s^2}$.</p><h2 id="gravitational-potential">Gravitational Potential</h2><h3 id="definition">Definition</h3><p>The gravitational field vector $\mathbf{g}$ varies as $1/r^2$, and therefore satisfies the condition ($\nabla \times \mathbf{g} \equiv 0$) for being expressible as the gradient of some scalar function (potential). Thus we can write:</p>\[\mathbf{g} \equiv -\nabla \Phi \label{eqn:gradient_phi}\tag{5}\]<p>where $\Phi$ is called the <strong>gravitational potential</strong>, and has dimensions of (<em>force per unit mass</em>) × (<em>distance</em>) or <em>energy per unit mass</em>.</p><p>Since $\mathbf{g}$ depends only on the radius, $\Phi$ also varies with $r$. From equations ($\ref{eqn:g_vector}$) and ($\ref{eqn:gradient_phi}$):</p>\[\nabla\Phi = \frac{d\Phi}{dr}\mathbf{e}_r = G\frac{M}{r^2}\mathbf{e}_r\]<p>Integrating this gives:</p>\[\boxed{\Phi = -G\frac{M}{r}} \label{eqn:g_potential}\tag{6}\]<p>Since only the relative difference in gravitational potential has meaning and the absolute magnitude of the value is meaningless, we can omit the integration constant. Usually the condition $\Phi \to 0$ as $r \to \infty$ is arbitrarily set to remove ambiguity, and equation ($\ref{eqn:g_potential}$) satisfies this condition.</p><p>For continuous mass distributions, the gravitational potential is:</p>\[\Phi = -G\int_V \frac{\rho(\mathbf{r\prime})}{r}dv^\prime \label{eqn:g_potential_v}\tag{7}\]<p>For mass distributed on a thin shell surface:</p>\[\Phi = -G\int_S \frac{\rho_s}{r}da^\prime. \label{eqn:g_potential_s}\tag{8}\]<p>And for a linear mass source with linear density $\rho_l$:</p>\[\Phi = -G\int_\Gamma \frac{\rho_l}{r}ds^\prime. \label{eqn:g_potential_l}\tag{9}\]<h3 id="physical-meaning">Physical Meaning</h3><p>Consider the work per unit mass $dW^\prime$ done by an object when it moves by $d\mathbf{r}$ in a gravitational field.</p>\[\begin{align*} dW^\prime &amp;= -\mathbf{g}\cdot d\mathbf{r} = (\nabla \Phi)\cdot d\mathbf{r} \\ &amp;= \sum_i \frac{\partial \Phi}{\partial x_i}dx_i = d\Phi \label{eqn:work}\tag{10} \end{align*}\]<p>In this equation, $\Phi$ is a function of position coordinates only, expressed as $\Phi=\Phi(x_1, x_2, x_3) = \Phi(x_i)$. Therefore, the work per unit mass done by an object when moved from one point to another in a gravitational field equals the potential difference between those two points.</p><p>If we define the gravitational potential at infinity to be $0$, then $\Phi$ at any point can be interpreted as the work per unit mass required to move the object from infinity to that point. Since the potential energy of an object equals the product of its mass and the gravitational potential $\Phi$, if $U$ is the potential energy:</p>\[U = m\Phi. \label{eqn:potential_e}\tag{11}\]<p>Therefore, the gravitational force on an object is obtained by taking the negative gradient of its potential energy:</p>\[\mathbf{F} = -\nabla U \label{eqn:force_and_potential}\tag{12}\]<p>When an object is placed in a gravitational field created by some mass, there is always some potential energy. Strictly speaking, this potential energy resides in the field itself, but it is conventionally expressed as the potential energy of the object.</p><h2 id="example-gravitational-potential-inside-and-outside-a-spherical-shell-shell-theorem">Example: Gravitational Potential Inside and Outside a Spherical Shell (Shell Theorem)</h2><h3 id="coordinate-setup--expressing-gravitational-potential-as-an-integral">Coordinate Setup &amp; Expressing Gravitational Potential as an Integral</h3><p>Let’s find the gravitational potential inside and outside a uniform spherical shell with inner radius $b$ and outer radius $a$. While the gravitational force due to a spherical shell can be obtained by directly calculating the force components acting on a unit mass in the field, using the potential method is simpler.</p><p><img src="/assets/img/gravitational-field-and-potential/spherical-shell.png" alt="Spherical shell" /></p><p>In the figure above, let’s calculate the potential at point $P$ at distance $R$ from the center. Assuming uniform mass distribution in the shell, $\rho(r^\prime)=\rho$, and due to symmetry about the azimuthal angle $\phi$ with respect to the line connecting the sphere’s center and point $P$:</p>\[\begin{align*} \Phi &amp;= -G\int_V \frac{\rho(r^\prime)}{r}dv^\prime \\ &amp;= -\rho G \int_0^{2\pi} \int_0^\pi \int_b^a \frac{1}{r}(dr^\prime)(r^\prime d\theta)(r^\prime \sin\theta\, d\phi) \\ &amp;= -\rho G \int_0^{2\pi} d\phi \int_b^a {r^\prime}^2 dr^\prime \int_0^\pi \frac{\sin\theta}{r}d\theta \\ &amp;= -2\pi\rho G \int_b^a {r^\prime}^2 dr^\prime \int_0^\pi \frac{\sin\theta}{r}d\theta. \label{eqn:spherical_shell_1}\tag{13} \end{align*}\]<p>By the law of cosines:</p>\[r^2 = {r^\prime}^2 + R^2 - 2r^\prime R \cos\theta \label{eqn:law_of_cosines}\tag{14}\]<p>Since $R$ is constant, differentiating this equation with respect to $r^\prime$:</p>\[2rdr = 2r^\prime R \sin\theta d\theta\] \[\frac{\sin\theta}{r}d\theta = \frac{dr}{r^\prime R} \tag{15}\]<p>Substituting this into equation ($\ref{eqn:spherical_shell_1}$):</p>\[\Phi = -\frac{2\pi\rho G}{R} \int_b^a r^\prime dr^\prime \int_{r_\mathrm{min}}^{r_\mathrm{max}} dr. \label{eqn:spherical_shell_2}\tag{16}\]<p>Here, $r_\mathrm{max}$ and $r_\mathrm{min}$ are determined by the position of point $P$.</p><h3 id="when-ra">When $R&gt;a$</h3>\[\begin{align*} \Phi(R&gt;a) &amp;= -\frac{2\pi\rho G}{R} \int_b^a r^\prime dr^\prime \int_{R-r^\prime}^{R+r^\prime} dr \\ &amp;= - \frac{4\pi\rho G}{R} \int_b^a {r^\prime}^2 dr^\prime \\ &amp;= - \frac{4}{3}\frac{\pi\rho G}{R}(a^3 - b^3). \label{eqn:spherical_shell_outside_1}\tag{17} \end{align*}\]<p>The mass $M$ of the spherical shell is:</p>\[M = \frac{4}{3}\pi\rho(a^3 - b^3) \label{eqn:mass_of_shell}\tag{18}\]<p>Therefore, the potential is:</p>\[\boxed{\Phi(R&gt;a) = -\frac{GM}{R}} \label{eqn:spherical_shell_outside_2}\tag{19}\]<blockquote class="prompt-info"><p>Comparing the gravitational potential due to a point mass of mass $M$ in equation ($\ref{eqn:g_potential}$) with the result just obtained ($\ref{eqn:spherical_shell_outside_2}$), we see they are identical. This means that when calculating the gravitational potential at any external point due to a spherically symmetric mass distribution, we can assume all mass is concentrated at the center. Most spherical celestial bodies of a certain size or larger, such as Earth or the Moon, fall into this category, as they can be considered as countless spherical shells with the same center but different diameters nested like <a href="https://en.wikipedia.org/wiki/Matryoshka_doll">Matryoshka dolls</a>. This provides the <a href="#newtons-law-of-universal-gravitation">justification for assuming celestial bodies like Earth or the Moon as point masses without size in calculations</a> mentioned at the beginning of this post.</p></blockquote><h3 id="when-rb">When $R&lt;b$</h3>\[\begin{align*} \Phi(R&lt;b) &amp;= -\frac{2\pi\rho G}{R} \int_b^a r^\prime dr^\prime \int_{r^\prime - R}^{r^\prime + R}dr \\ &amp;= -4\pi\rho G \int_b^a r^\prime dr^\prime \\ &amp;= -2\pi\rho G(a^2 - b^2). \label{eqn:spherical_shell_inside}\tag{20} \end{align*}\]<blockquote class="prompt-info"><p>Inside a spherically symmetric mass shell, the gravitational potential is constant regardless of position, and the gravitational force is $0$.</p></blockquote><blockquote class="prompt-tip"><p>This is also a major reason why the “Hollow Earth theory,” one of the representative pseudosciences, is nonsense. If Earth were a spherical shell with a hollow interior as claimed by the Hollow Earth theory, no gravitational force would act on any object inside that cavity. Considering Earth’s mass and volume, such a hollow cannot exist, and even if it did, life forms there would not live with the inner surface of the spherical shell as ground, but would float in a weightless state like in a space station.<br /> <a href="https://youtu.be/VD6xJq8NguY?si=szgtuLkuk6rPJag3">Microorganisms may live several kilometers deep underground</a>, but at least not in the form claimed by the Hollow Earth theory. I also really enjoy Jules Verne’s novel “Journey to the Center of the Earth” and the movie “Journey to the Center of the Earth,” but we should enjoy fiction as fiction and not seriously believe it.</p></blockquote><h3 id="when-bra">When $b&lt;R&lt;a$</h3>\[\begin{align*} \Phi(b&lt;R&lt;a) &amp;= -\frac{4\pi\rho G}{3R}(R^3 - b^3) - 2\pi\rho G(a^2 - R^2) \\ &amp;= -4\pi\rho G \left( \frac{a^2}{2} - \frac{b^3}{3R} - \frac{R^2}{6} \right) \label{eqn:within_spherical_shell}\tag{21} \end{align*}\]<h3 id="results">Results</h3><p>The gravitational potential $\Phi$ in the three regions obtained above, and the corresponding magnitude of the gravitational field vector $|\mathbf{g}|$ as functions of distance $R$, are shown graphically as follows:</p><p><img src="/physics-visualizations/figs/shell-theorem-gravitational-potential.png" alt="Gravitational Potential as a Function of R" /><br /> <img src="/physics-visualizations/figs/shell-theorem-field-vector.png" alt="Magnitude of the Field Vector as a Function of R" /></p><blockquote><ul><li>Python visualization code: <a href="https://github.com/yunseo-kim/physics-visualizations/blob/main/src/shell_theorem.py">yunseo-kim/physics-visualizations repository</a><li>License: <a href="https://github.com/yunseo-kim/physics-visualizations?tab=readme-ov-file#license">See here</a></ul></blockquote><p>We can see that both the gravitational potential and the magnitude of the gravitational field vector are continuous. If the gravitational potential were discontinuous at any point, the gradient of the potential at that point, i.e., the magnitude of gravity, would become infinite, which is not physically reasonable, so the potential function must be continuous at all points. However, the <em>derivative</em> of the gravitational field vector is discontinuous at the inner and outer surfaces of the shell.</p><h2 id="example-galactic-rotation-curves">Example: Galactic Rotation Curves</h2><p>According to astronomical observations, in many spiral galaxies that rotate about their centers, such as the Milky Way and Andromeda Galaxy, most of the observable mass is concentrated near the center. However, the orbital velocities of masses in these spiral galaxies greatly disagree with theoretically predicted values based on the observable mass distribution, as can be seen in the following graph, and remain nearly constant beyond a certain distance.</p><p><img src="https://upload.wikimedia.org/wikipedia/commons/b/b9/GalacticRotation2.svg" alt="Galactic Rotation" width="972" /></p><blockquote><p><em>Image source</em></p><ul><li>Author: Wikipedia user <a href="https://en.wikipedia.org/wiki/User:PhilHibbs">PhilHibbs</a><li>License: Public Domain</ul></blockquote><p> <video class="embed-video file" controls="" autoplay="" loop=""> <source src="https://cdn.jsdelivr.net/gh/yunseo-kim/yunseo-kim.github.io/assets/video/gravitational-field-and-potential/Galaxy_rotation_under_the_influence_of_dark_matter.webm" type="video/webm" /> <source src="https://cdn.jsdelivr.net/gh/yunseo-kim/yunseo-kim.github.io/assets/video/gravitational-field-and-potential/Galaxy_rotation_under_the_influence_of_dark_matter.ogg" type="video/ogg" /> Your browser does not support the video tag. Here is a <a href="https://cdn.jsdelivr.net/gh/yunseo-kim/yunseo-kim.github.io/assets/video/gravitational-field-and-potential/Galaxy_rotation_under_the_influence_of_dark_matter.webm">link to the video file</a> instead. </video> <em>Left: Predicted galactic rotation from observable mass | Right: Actual observed galactic rotation.</em></p><blockquote><p><em>Video source</em></p><ul><li>Original file (Ogg Theora video) link: <a href="https://commons.wikimedia.org/wiki/File:Galaxy_rotation_under_the_influence_of_dark_matter.ogv">https://commons.wikimedia.org/wiki/File:Galaxy_rotation_under_the_influence_of_dark_matter.ogv</a><li>Author: <a href="https://beltoforion.de/en/index.php">Ingo Berg</a><li>License: <a href="https://creativecommons.org/licenses/by/3.0/deed.en">CC BY-SA 3.0</a><li>Simulation method and code used: <a href="https://beltoforion.de/en/spiral_galaxy_renderer/">https://beltoforion.de/en/spiral_galaxy_renderer/</a></ul></blockquote><blockquote class="prompt-danger"><p>The image file previously embedded on this page, <code class="language-plaintext highlighter-rouge">Rotation curve of spiral galaxy Messier 33 (Triangulum).png</code>, was <a href="https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Rotation_curve_of_spiral_galaxy_Messier_33_(Triangulum).png">deleted from Wikimedia Commons</a> after it was determined to be a derivative work by Wikimedia user <a href="https://commons.wikimedia.org/wiki/User:Accrama">Mario De Leo</a> that plagiarized <a href="https://markwhittle.uvacreate.virginia.edu/">Prof. Mark Whittle</a> of the University of Virginia’s non-free work without proper citation. Accordingly, it has also been removed from this page.</p></blockquote><p>Let’s predict the orbital velocity as a function of distance when the galaxy’s mass is concentrated at the center, confirm that this prediction does not match the observational results, and show that the mass $M(R)$ distributed within distance $R$ from the galactic center must be proportional to $R$ to explain the observations.</p><p>First, if the galactic mass $M$ is concentrated at the center, the orbital velocity at distance $R$ is:</p>\[\frac{GMm}{R^2} = \frac{mv^2}{R}\] \[v = \sqrt{\frac{GM}{R}} \propto \frac{1}{\sqrt{R}}.\]<p>In this case, an orbital velocity decreasing as $1/\sqrt{R}$ is predicted, as shown by the dotted lines in the two graphs above. However, according to observational results, the orbital velocity $v$ is nearly constant regardless of distance $R$, so the prediction and observations do not match. These observational results can only be explained if $M(R)\propto R$.</p><p>Setting $M(R) = kR$ using proportionality constant $k$:</p>\[v = \sqrt{\frac{GM(R)}{R}} = \sqrt{Gk}\ \text{(constant)}.\]<p>From this, astrophysicists conclude that many galaxies must contain undiscovered “dark matter,” and this dark matter must account for more than 90% of the universe’s mass. However, the identity of dark matter has not yet been clearly revealed, and while not mainstream theory, attempts like Modified Newtonian Dynamics (MOND) exist to explain observational results without assuming the existence of dark matter. Today, these research fields are at the forefront of astrophysics.</p>]]> </content> </entry> <entry><title xml:lang="en">Method of Undetermined Coefficients</title><link href="https://www.yunseo.kim/posts/method-of-undetermined-coefficients/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/method-of-undetermined-coefficients/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/method-of-undetermined-coefficients/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/method-of-undetermined-coefficients/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/method-of-undetermined-coefficients/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/method-of-undetermined-coefficients/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/method-of-undetermined-coefficients/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/method-of-undetermined-coefficients/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/method-of-undetermined-coefficients/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/method-of-undetermined-coefficients/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-04-20T00:00:00+09:00</published> <updated>2025-07-09T19:24:14+09:00</updated> <id>https://www.yunseo.kim/posts/method-of-undetermined-coefficients/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Differential Equation" /> <summary xml:lang="en">Explore the method of undetermined coefficients, a powerful technique for solving specific nonhomogeneous linear ODEs with constant coefficients, widely used in engineering for models like vibrating systems and RLC circuits.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Explore the method of undetermined coefficients, a powerful technique for solving specific nonhomogeneous linear ODEs with constant coefficients, widely used in engineering for models like vibrating systems and RLC circuits.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="tldr">TL;DR</h2><blockquote class="prompt-info"><ul><li><strong>Method of Undetermined Coefficients</strong> is applicable to:<ul><li>Linear ODEs $y^{\prime\prime} + ay^{\prime} + by = r(x)$<li>with <strong>constant coefficients $a$ and $b$</strong>,<li>and where the input $r(x)$ is an exponential function, a power of $x$, a cosine or sine, or sums and products of such functions.</ul><li><strong>Choice Rules for the Method of Undetermined Coefficients</strong><ul><li><strong>(a) Basic Rule</strong>: If $r(x)$ in Eq. ($\ref{eqn:linear_ode_with_constant_coefficients}$) is one of the functions in the first column of the table, choose the corresponding $y_p$ from the same row and determine its undetermined coefficients by substituting $y_p$ and its derivatives into Eq. ($\ref{eqn:linear_ode_with_constant_coefficients}$).<li><strong>(b) Modification Rule</strong>: If a term in your choice for $y_p$ is a solution of the corresponding homogeneous ODE $y^{\prime\prime} + ay^{\prime} + by = 0$, multiply this term by $x$ (or by $x^2$ if this solution corresponds to a double root of the characteristic equation of the homogeneous ODE).<li><strong>(c) Sum Rule</strong>: If $r(x)$ is a sum of functions in the first column of the table, choose for $y_p$ the sum of the functions in the corresponding rows of the second column.</ul></ul><table><thead><tr><th style="text-align: left">Term in $r(x)$<th style="text-align: left">Choice for $y_p(x)$<tbody><tr><td style="text-align: left">$ke^{\gamma x}$<td style="text-align: left">$Ce^{\gamma x}$<tr><td style="text-align: left">$kx^n\ (n=0,1,\cdots)$<td style="text-align: left">$K_nx^n + K_{n-1}x^{n-1} + \cdots + K_1x + K_0$<tr><td style="text-align: left">$k\cos{\omega x}$<br />$k\sin{\omega x}$<td style="text-align: left">$K\cos{\omega x} + M\sin{\omega x}$<tr><td style="text-align: left">$ke^{\alpha x}\cos{\omega x}$<br />$ke^{\alpha x}\sin{\omega x}$<td style="text-align: left">$e^{\alpha x}(K\cos{\omega x} + M\sin{\omega x})$</table></blockquote><h2 id="prerequisites">Prerequisites</h2><ul><li><a href="/posts/homogeneous-linear-odes-of-second-order/">Homogeneous Linear ODEs of Second Order</a><li><a href="/posts/homogeneous-linear-odes-with-constant-coefficients/">Homogeneous Linear ODEs with Constant Coefficients</a><li><a href="/posts/euler-cauchy-equation/">Euler-Cauchy Equation</a><li><a href="/posts/wronskian-existence-and-uniqueness-of-solutions/">Wronskian, Existence and Uniqueness of Solutions</a><li><a href="/posts/nonhomogeneous-linear-odes-of-second-order/">Nonhomogeneous Linear ODEs of Second Order</a><li>Vector Spaces, Linear Span (Linear Algebra)</ul><h2 id="method-of-undetermined-coefficients">Method of Undetermined Coefficients</h2><p>Consider a second-order nonhomogeneous linear ordinary differential equation where $r(x) \not\equiv 0$</p>\[y^{\prime\prime} + p(x)y^{\prime} + q(x)y = r(x) \label{eqn:nonhomogeneous_linear_ode}\tag{1}\]<p>and its corresponding homogeneous ordinary differential equation</p>\[y^{\prime\prime} + p(x)y^{\prime} + q(x)y = 0 \label{eqn:homogeneous_linear_ode}\tag{2}\]<p>As we saw in <a href="/posts/nonhomogeneous-linear-odes-of-second-order/">Nonhomogeneous Linear ODEs of Second Order</a>, to solve an initial value problem for the nonhomogeneous linear ODE ($\ref{eqn:nonhomogeneous_linear_ode}$), we must first solve the homogeneous ODE ($\ref{eqn:homogeneous_linear_ode}$) to find $y_h$, then find a particular solution $y_p$ of Eq. ($\ref{eqn:nonhomogeneous_linear_ode}$) to obtain the general solution</p>\[y(x) = y_h(x) + y_p(x) \label{eqn:general_sol}\tag{3}\]<p>So, how can we find $y_p$? A general method for finding $y_p$ is the <strong>method of variation of parameters</strong>, but in some cases, a much simpler method, the <strong>method of undetermined coefficients</strong>, can be applied. It is a frequently used method in engineering, especially as it can be applied to models of vibrating systems and RLC electrical circuits.</p><p>The method of undetermined coefficients is suitable for linear ODEs with <strong>constant coefficients $a$ and $b$</strong>, and where the input $r(x)$ is an exponential function, a power of $x$, a cosine or sine, or sums and products of such functions:</p>\[y^{\prime\prime} + ay^{\prime} + by = r(x) \label{eqn:linear_ode_with_constant_coefficients}\tag{4}\]<p>The key to the method of undetermined coefficients is that an $r(x)$ of this form has derivatives that are similar in form to itself. To apply this method, we choose a $y_p$ that is similar in form to $r(x)$ but has unknown coefficients, which are determined by substituting $y_p$ and its derivatives into the given ODE. For forms of $r(x)$ that are practically important in engineering, the rules for choosing an appropriate $y_p$ are as follows.</p><blockquote class="prompt-info"><p><strong>Choice Rules for the Method of Undetermined Coefficients</strong><br /> <strong>(a) Basic Rule</strong>: If $r(x)$ in Eq. ($\ref{eqn:linear_ode_with_constant_coefficients}$) is one of the functions in the first column of the table, choose the corresponding $y_p$ from the same row and determine its undetermined coefficients by substituting $y_p$ and its derivatives into Eq. ($\ref{eqn:linear_ode_with_constant_coefficients}$).<br /> <strong>(b) Modification Rule</strong>: If a term in your choice for $y_p$ is a solution of the corresponding homogeneous ODE $y^{\prime\prime} + ay^{\prime} + by = 0$, multiply this term by $x$ (or by $x^2$ if this solution corresponds to a double root of the characteristic equation of the homogeneous ODE).<br /> <strong>(c) Sum Rule</strong>: If $r(x)$ is a sum of functions in the first column of the table, choose for $y_p$ the sum of the functions in the corresponding rows of the second column.</p><table><thead><tr><th style="text-align: left">Term in $r(x)$<th style="text-align: left">Choice for $y_p(x)$<tbody><tr><td style="text-align: left">$ke^{\gamma x}$<td style="text-align: left">$Ce^{\gamma x}$<tr><td style="text-align: left">$kx^n\ (n=0,1,\cdots)$<td style="text-align: left">$K_nx^n + K_{n-1}x^{n-1} + \cdots + K_1x + K_0$<tr><td style="text-align: left">$k\cos{\omega x}$<br />$k\sin{\omega x}$<td style="text-align: left">$K\cos{\omega x} + M\sin{\omega x}$<tr><td style="text-align: left">$ke^{\alpha x}\cos{\omega x}$<br />$ke^{\alpha x}\sin{\omega x}$<td style="text-align: left">$e^{\alpha x}(K\cos{\omega x} + M\sin{\omega x})$</table></blockquote><p>This method has the advantage of being not only simple but also self-correcting. If you choose $y_p$ incorrectly or with too few terms, you will arrive at a contradiction. If you choose too many terms, the coefficients of the unnecessary terms will turn out to be $0$, leading to the correct result. Even if something goes wrong while applying the method, you will naturally notice it during the solution process. Therefore, as long as you choose a reasonably appropriate $y_p$ according to the choice rules above, you can try it without much hesitation.</p><h3 id="proof-of-the-sum-rule">Proof of the Sum Rule</h3><p>Consider a nonhomogeneous linear ODE of the form $r(x) = r_1(x) + r_2(x)$:</p>\[y^{\prime\prime} + ay^{\prime} + by = r_1(x) + r_2(x)\]<p>Now, let’s assume that the following two equations, with the same left-hand side but with inputs $r_1$ and $r_2$, have particular solutions ${y_p}_1$ and ${y_p}_2$, respectively.</p>\[\begin{gather*} y^{\prime\prime} + ay^{\prime} + by = r_1(x) \\ y^{\prime\prime} + ay^{\prime} + by = r_2(x) \end{gather*}\]<p>If we denote the left-hand side of the given equation as $L[y]$, then due to the linearity of $L[y]$, the sum rule holds because the following is satisfied for $y_p = {y_p}_1 + {y_p}_2$.</p>\[L[y_p] = L[{y_p}_1 + {y_p}_2] = L[{y_p}_1] + L[{y_p}_2] = r_1 + r_2 = r. \ \blacksquare\]<h2 id="example-yprimeprime--ayprime--by--kegamma-x">Example: $y^{\prime\prime} + ay^{\prime} + by = ke^{\gamma x}$</h2><p>According to the basic rule (a), we set $y_p = Ce^{\gamma x}$ and substitute it into the given equation $y^{\prime\prime} + ay^{\prime} + by = ke^{\gamma x}$:</p>\[\gamma^2 Ce^{\gamma x} + \gamma aCe^{\gamma x} + bCe^{\gamma x} = ke^{\gamma x}\] \[C(\gamma^2 + a\gamma + b)e^{\gamma x} = ke^{\gamma x}\] \[C(\gamma^2 + a\gamma + b) = k.\]<h3 id="case-where-gamma2--agamma--b-neq-0">Case where $\gamma^2 + a\gamma + b \neq 0$</h3><p>We can determine the undetermined coefficient $C$ and find $y_p$ as follows.</p>\[C = \frac{k}{\gamma^2 + a\gamma + b}\] \[y_p = Ce^{\gamma x} = \frac{k}{\gamma^2 + a\gamma + b} e^{\gamma x}.\]<h3 id="case-where-gamma2--agamma--b--0">Case where $\gamma^2 + a\gamma + b = 0$</h3><p>In this case, we must apply the modification rule (b). First, let’s find the roots of the characteristic equation of the homogeneous ODE $y^{\prime\prime} + ay^{\prime} + by = 0$ by using the fact that $b = -\gamma^2 - a\gamma = -\gamma(a + \gamma)$.</p>\[y^{\prime\prime} + ay^{\prime} - \gamma(a + \gamma)y = 0\] \[\lambda^2 + a\lambda - \gamma(a + \gamma) = 0\] \[(\lambda + (a + \gamma))(\lambda - \gamma) = 0\] \[\lambda = \gamma, -a -\gamma.\]<p>From this, we obtain the basis for the homogeneous ODE:</p>\[y_1 = e^{\gamma x}, \quad y_2 = e^{(-a - \gamma)x}\]<h4 id="case-where-gamma-neq--a-gamma">Case where $\gamma \neq -a-\gamma$</h4><p>Since the chosen $y_p = Ce^{\gamma x}$ is a solution of the corresponding homogeneous ODE but not a double root, we multiply this term by $x$ according to the modification rule (b) and set $y_p = Cxe^{\gamma x}$.</p><p>Now, substituting the modified $y_p$ back into the given equation $y^{\prime\prime} + ay^{\prime} - \gamma(a + \gamma)y = ke^{\gamma x}$:</p>\[C(2\gamma + \gamma^2 x)e^{\gamma x} + aC(1 + \gamma x)e^{\gamma x} - \gamma(a + \gamma)Cxe^{\gamma x} = ke^{\gamma x}\] \[C \left[\left\{\gamma^2 + a\gamma -\gamma(a + \gamma)\right\}x + 2\gamma + a \right]e^{\gamma x} = ke^{\gamma x}\] \[C(2\gamma + a)e^{\gamma x} = ke^{\gamma x}\] \[C(2\gamma + a) = k\] \[\therefore C = \frac{k}{2\gamma + a}, \quad y_p = Cxe^{\gamma x} = \frac{k}{2\gamma + a}xe^{\gamma x}.\]<h4 id="case-where-gamma---a-gamma">Case where $\gamma = -a-\gamma$</h4><p>In this case, the chosen $y_p = Ce^{\gamma x}$ corresponds to a double root of the characteristic equation of the homogeneous ODE. Therefore, according to the modification rule (b), we multiply this term by $x^2$ and set $y_p = Cx^2 e^{\gamma x}$.</p><p>Now, substituting the modified $y_p$ back into the given equation $y^{\prime\prime} - 2\gamma y^{\prime} + \gamma^2 y = ke^{\gamma x}$:</p>\[C(2 + 4\gamma x + \gamma^2 x^2)e^{\gamma x} + C(-4\gamma x - 2\gamma^2 x^2)e^{\gamma x} + C(\gamma^2 x^2)e^{\gamma x} = ke^{\gamma x}\] \[2Ce^{\gamma x} = ke^{\gamma x}\] \[2C = k\] \[\therefore C = \frac{k}{2}, \quad y_p = Cx^2 e^{\gamma x} = \frac{k}{2}x^2 e^{\gamma x}.\]<h2 id="extension-of-the-method-rx-as-a-product-of-functions">Extension of the Method: $r(x)$ as a Product of Functions</h2><p>Consider a nonhomogeneous linear ODE where $r(x)$ is of the form $k x^n e^{\alpha x}\cos(\omega x)$:</p>\[y^{\prime\prime} + ay^{\prime} + by = C x^n e^{\alpha x}\cos(\omega x)\]<p>If we assume $r(x)$ is a product of functions like an exponential function $e^{\alpha x}$, a power of $x$ like $x^m$, and a cosine or sine function like $\cos{\omega x}$ or $\sin{\omega x}$ (here we assume cosine without loss of generality), or a sum of such products (i.e., it can be expressed as a sum and product of functions from the first column of the previous table), we will show that a solution $y_p$ exists which is a sum and product of functions from the second column of the same table.</p><blockquote class="prompt-tip"><p>For a rigorous proof, some parts are described using linear algebra and are marked with an asterisk (*). You can skip these parts and still get a general understanding.</p></blockquote><h3 id="defining-the-vector-space-v">Defining the Vector Space $V$*</h3><p>For an $r(x)$ of the form \(\begin{align*} r(x) &amp;= C_1x^{n_1}e^{\alpha_1 x} \times C_2x^{n_2}e^{\alpha_2 x}\cos(\omega x) \times \cdots \\ &amp;= C x^n e^{\alpha x}\cos(\omega x) \end{align*}\)</p><p>we can define a vector space $V$ such that $r(x) \in V$ as follows:</p>\[V = \mathrm{span}\left\{x^k e^{\alpha x}\cos(\omega x), \; x^k e^{\alpha x}\sin(\omega x) \bigm| k=0,1,\dots,n \right\}\]<h3 id="derivative-forms-of-exponential-polynomial-and-trigonometric-functions">Derivative Forms of Exponential, Polynomial, and Trigonometric Functions</h3><p>The derivative forms of the basic functions presented in the first column of the previous table are as follows.</p><ul><li>Exponential function: $\cfrac{d}{dx}e^{\alpha x} = \alpha e^{\alpha x}$<li>Polynomial function: $\cfrac{d}{dx}x^m = mx^{m-1}$<li>Trigonometric functions: $\cfrac{d}{dx}\cos\omega x = -\omega\sin\omega x, \quad \cfrac{d}{dx}\sin\omega x = \omega\cos\omega x$</ul><p>The derivatives obtained by differentiating these functions are also expressed as a <u>sum of the same kinds of functions</u>.</p><p>Therefore, if functions $f$ and $g$ are the functions above or their sums, applying the product rule to $r(x) = f(x)g(x)$ gives</p>\[\begin{align*} (fg)^{\prime} &amp;= f^{\prime}g + fg^{\prime}, \\ (fg)^{\prime\prime} &amp;= f^{\prime\prime}g + 2f^{\prime}g^{\prime} + fg^{\prime\prime} \end{align*}\]<p>and here, $f$, $f^{\prime}$, $f^{\prime\prime}$ and $g$, $g^{\prime}$, $g^{\prime\prime}$ can all be written as sums or constant multiples of exponential, polynomial, and trigonometric functions. Thus, $r^{\prime}(x) = (fg)^{\prime}$ and $r^{\prime\prime}(x) = (fg)^{\prime\prime}$, like $r(x)$, can also be expressed as sums and products of these functions.</p><h3 id="invariance-of-v-under-the-differential-operator-d-and-linear-transformation-l">Invariance of $V$ under the Differential Operator $D$ and Linear Transformation $L$*</h3><p>That is, not only $r(x)$ itself, but also $r^{\prime}(x)$ and $r^{\prime\prime}(x)$ are linear combinations of terms of the form $x^k e^{\alpha x}\cos(\omega x)$ and $x^k e^{\alpha x}\sin(\omega x)$, so</p>\[r(x) \in V \implies r^{\prime}(x) \in V,\ r^{\prime\prime}(x) \in V.\]<p>Not limiting this to just $r(x)$, if we introduce the differential operator $D$ for all elements of the previously defined vector space $V$ to express this more generally, <em>the vector space $V$ is closed under the differentiation operation $D$</em>. Therefore, if we denote the left-hand side of the given equation, $y^{\prime\prime} + ay^{\prime} + by$, as $L[y]$, then <em>$V$ is invariant under $L$</em>.</p>\[D^2(V)\subseteq V,\quad aD(V)\subseteq V,\quad b\,V\subseteq V \implies L(V)\subseteq V.\]<p>Since $r(x) \in V$ and $V$ is invariant under $L$, there exists another element $y_p \in V$ that satisfies $L[y_p] = r$.</p>\[\exists y_p \in V: L[y_p] = r\]<h3 id="ansatz">Ansatz</h3><p>Therefore, if we choose an appropriate $y_p$ as a sum of all possible product terms using undetermined coefficients $A_0, A_1, \dots, A_n$ and $K, M$ as follows, we can determine the undetermined coefficients by substituting $y_p$ (or $xy_p$, $x^2y_p$) and its derivatives into the given equation, according to the basic rule (a) and the modification rule (b). Here, $n$ should be determined according to the degree of $x$ in $r(x)$.</p>\[y_p = e^{\alpha x}(A_nx^n + A_{n-1}x^{n-1} + \cdots + A_1x + A_0)(K\cos{\omega x} + M \sin{\omega x}).\]<p>$\blacksquare$</p><blockquote class="prompt-warning"><p>If the given input $r(x)$ includes several different values of $\alpha_i$ and $\omega_j$, you must choose $y_p$ to include all possible terms of the form $x^{k}e^{\alpha_i x}\cos(\omega_j x)$ and $x^{k}e^{\alpha_i x}\sin(\omega_j x)$ for each $\alpha_i$ and $\omega_j$ value.<br /> The advantage of the method of undetermined coefficients is its simplicity. If the ansatz becomes too complicated and this advantage is lost, it might be better to use the method of variation of parameters, which will be discussed later.</p></blockquote><h2 id="extension-of-the-method-euler-cauchy-equation">Extension of the Method: Euler-Cauchy Equation</h2><p>The method of undetermined coefficients can be utilized not only for <a href="/posts/homogeneous-linear-odes-with-constant-coefficients/">homogeneous linear ODEs with constant coefficients</a> but also for the <a href="/posts/euler-cauchy-equation/">Euler-Cauchy equation</a>:</p>\[x^2y^{\prime\prime} + axy^{\prime} + by = r(x) \label{eqn:euler_cauchy}\tag{5}\]<h3 id="change-of-variables">Change of Variables</h3><p>By <a href="/posts/euler-cauchy-equation/#transformation-to-a-homogeneous-linear-ode-with-constant-coefficients">substituting $x = e^t$ to transform it into a homogeneous linear ODE with constant coefficients</a>, we get</p>\[\frac{d}{dx} = \frac{1}{x}\frac{d}{dt}, \quad \frac{d^2}{dx^2} = \frac{1}{x^2}\left(\frac{d^2}{dt^2} - \frac{d}{dt} \right)\]<p>which, as we have seen before, allows us to convert the Euler-Cauchy equation into the following homogeneous linear ODE with constant coefficients in terms of $t$.</p>\[y^{\prime\prime} + (a-1)y^{\prime} + by = r(e^t). \label{eqn:substituted}\tag{6}\]<p>Now, we can apply the <a href="#method-of-undetermined-coefficients">previously discussed method of undetermined coefficients</a> to Eq. ($\ref{eqn:substituted}$) to solve for $t$, and finally, use $t = \ln x$ to find the solution in terms of $x$.</p><h3 id="case-where-rx-is-a-power-of-x-a-natural-logarithm-or-a-sumproduct-of-such-functions">Case where $r(x)$ is a power of $x$, a natural logarithm, or a sum/product of such functions</h3><p>In particular, if the input $r(x)$ consists of powers of $x$, natural logarithms, or sums and products of such functions, an appropriate $y_p$ can be chosen directly according to the following choice rules for the Euler-Cauchy equation.</p><blockquote class="prompt-info"><p><strong>Choice Rules for the Method of Undetermined Coefficients: For Euler-Cauchy Equations</strong><br /> <strong>(a) Basic Rule</strong>: If $r(x)$ in Eq. ($\ref{eqn:euler_cauchy}$) is one of the functions in the first column of the table, choose the corresponding $y_p$ from the same row and determine its undetermined coefficients by substituting $y_p$ and its derivatives into Eq. ($\ref{eqn:euler_cauchy}$).<br /> <strong>(b) Modification Rule</strong>: If a term in your choice for $y_p$ is a solution of the corresponding homogeneous ODE $x^2y^{\prime\prime} + axy^{\prime} + by = 0$, multiply this term by $\ln{x}$ (or by $(\ln{x})^2$ if this solution corresponds to a double root of the characteristic equation of the homogeneous ODE).<br /> <strong>(c) Sum Rule</strong>: If $r(x)$ is a sum of functions in the first column of the table, choose for $y_p$ the sum of the functions in the corresponding rows of the second column.</p><table><thead><tr><th style="text-align: left">Term in $r(x)$<th style="text-align: left">Choice for $y_p(x)$<tbody><tr><td style="text-align: left">$kx^m\ (m=0,1,\cdots)$<td style="text-align: left">$Ax^m$<tr><td style="text-align: left">$kx^m \ln{x}\ (m=0,1,\cdots)$<td style="text-align: left">$x^m(B\ln x + C)$<tr><td style="text-align: left">$k(\ln{x})^s\ (s=0,1,\cdots)$<td style="text-align: left">$D_0 + D_1\ln{x} + \cdots + D_{s-1}(\ln{x})^{s-1} + D_s(\ln{x})^s$<tr><td style="text-align: left">$kx^m (\ln{x})^s$<br />$(m=0,1,\cdots ;\; s=0,1,\cdots)$<td style="text-align: left">$x^m \left( D_0 + D_1\ln{x} + \cdots + D_{s-1}(\ln{x})^{s-1} + D_s(\ln{x})^s \right)$</table></blockquote><p>This way, for practically important forms of the input $r(x)$, we can find the same $y_p$ as obtained through the <a href="#change-of-variables">change of variables</a> more quickly and easily. You can derive these choice rules for the Euler-Cauchy equation by substituting $\ln{x}$ for $x$ in the <a href="#method-of-undetermined-coefficients">original choice rules</a> we looked at earlier.</p>]]> </content> </entry> <entry><title xml:lang="en">Nonhomogeneous Linear ODEs of Second Order</title><link href="https://www.yunseo.kim/posts/nonhomogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/nonhomogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/nonhomogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/nonhomogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/nonhomogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/nonhomogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/nonhomogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/nonhomogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/nonhomogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/nonhomogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-04-16T00:00:00+09:00</published> <updated>2025-07-09T19:24:14+09:00</updated> <id>https://www.yunseo.kim/posts/nonhomogeneous-linear-odes-of-second-order/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Differential Equation" /> <summary xml:lang="en">Explore the structure of the general solution for second-order nonhomogeneous linear ODEs in relation to their homogeneous counterparts. This post proves the existence of a general solution and the non-existence of singular solutions.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Explore the structure of the general solution for second-order nonhomogeneous linear ODEs in relation to their homogeneous counterparts. This post proves the existence of a general solution and the non-existence of singular solutions.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="tldr">TL;DR</h2><blockquote class="prompt-info"><ul><li><strong>General solution</strong> of a second-order nonhomogeneous linear ODE $y^{\prime\prime} + p(x)y^{\prime} + q(x)y = r(x)$:<ul><li>$y(x) = y_h(x) + y_p(x)$<li>$y_h$: The general solution of the homogeneous ODE $y^{\prime\prime} + p(x)y^{\prime} + q(x)y = 0$, which is $y_h = c_1y_1 + c_2y_2$<li>$y_p$: A particular solution of the given nonhomogeneous ODE</ul><li>The response term $y_p$ is determined solely by the input $r(x)$. For the same nonhomogeneous ODE, $y_p$ does not change even if the initial conditions change. The difference between two particular solutions of a nonhomogeneous ODE is a solution of the corresponding homogeneous ODE.<li><strong>Existence of a general solution</strong>: If the coefficients $p(x)$, $q(x)$, and the input function $r(x)$ of a nonhomogeneous ODE are continuous, a general solution always exists.<li><strong>Non-existence of singular solutions</strong>: The general solution includes all solutions of the equation (i.e., no singular solutions exist).</ul></blockquote><h2 id="prerequisites">Prerequisites</h2><ul><li><a href="/posts/homogeneous-linear-odes-of-second-order/">Homogeneous Linear ODEs of Second Order</a><li><a href="/posts/wronskian-existence-and-uniqueness-of-solutions/">The Wronskian, Existence and Uniqueness of Solutions</a></ul><h2 id="general-and-particular-solutions-of-second-order-nonhomogeneous-linear-odes">General and Particular Solutions of Second-Order Nonhomogeneous Linear ODEs</h2><p>Consider the second-order nonhomogeneous linear ordinary differential equation</p>\[y^{\prime\prime} + p(x)y^{\prime} + q(x)y = r(x) \label{eqn:nonhomogeneous_linear_ode}\tag{1}\]<p>where $r(x) \not\equiv 0$. The <strong>general solution</strong> of equation ($\ref{eqn:nonhomogeneous_linear_ode}$) on an open interval $I$ is the sum of the general solution $y_h = c_1y_1 + c_2y_2$ of the corresponding homogeneous ODE</p>\[y^{\prime\prime} + p(x)y^{\prime} + q(x)y = 0 \label{eqn:homogeneous_linear_ode}\tag{2}\]<p>and a particular solution $y_p$ of equation ($\ref{eqn:nonhomogeneous_linear_ode}$), in the form</p>\[y(x) = y_h(x) + y_p(x) \label{eqn:general_sol}\tag{3}\]<p>Furthermore, a <strong>particular solution</strong> of equation ($\ref{eqn:nonhomogeneous_linear_ode}$) on the interval $I$ is a solution obtained from equation ($\ref{eqn:general_sol}$) by assigning specific values to the arbitrary constants $c_1$ and $c_2$ in $y_h$.</p><p>In other words, adding an input $r(x)$, which depends only on the independent variable $x$, to the homogeneous ODE ($\ref{eqn:homogeneous_linear_ode}$) adds a corresponding term $y_p$ to the response. This added response term $y_p$ is determined solely by the input $r(x)$, regardless of the initial conditions. As we will see later, if we take the difference between any two solutions $y_1$ and $y_2$ of equation ($\ref{eqn:nonhomogeneous_linear_ode}$) (i.e., the difference between particular solutions for two different sets of initial conditions), the term $y_p$, which is independent of the initial conditions, cancels out, leaving only the difference between ${y_h}_1$ and ${y_h}_2$. By the <a href="/posts/homogeneous-linear-odes-of-second-order/#superposition-principle">Superposition Principle</a>, this difference is a solution of equation ($\ref{eqn:homogeneous_linear_ode}$).</p><h2 id="relationship-between-solutions-of-nonhomogeneous-and-corresponding-homogeneous-odes">Relationship Between Solutions of Nonhomogeneous and Corresponding Homogeneous ODEs</h2><blockquote class="prompt-info"><p><strong>Theorem 1: Relationship Between Solutions of Nonhomogeneous ODE ($\ref{eqn:nonhomogeneous_linear_ode}$) and Homogeneous ODE ($\ref{eqn:homogeneous_linear_ode}$)</strong><br /> <strong>(a)</strong> The sum of a solution $y$ of the nonhomogeneous ODE ($\ref{eqn:nonhomogeneous_linear_ode}$) and a solution $\tilde{y}$ of the homogeneous ODE ($\ref{eqn:homogeneous_linear_ode}$) on some open interval $I$ is a solution of equation ($\ref{eqn:nonhomogeneous_linear_ode}$) on $I$. In particular, equation ($\ref{eqn:general_sol}$) is a solution of equation ($\ref{eqn:nonhomogeneous_linear_ode}$) on $I$.<br /> <strong>(b)</strong> The difference between two solutions of the nonhomogeneous ODE ($\ref{eqn:nonhomogeneous_linear_ode}$) on an interval $I$ is a solution of the homogeneous ODE ($\ref{eqn:homogeneous_linear_ode}$) on $I$.</p></blockquote><h3 id="proof">Proof</h3><h4 id="a">(a)</h4><p>Let’s denote the left-hand side of equations ($\ref{eqn:nonhomogeneous_linear_ode}$) and ($\ref{eqn:homogeneous_linear_ode}$) as $L[y]$. Then, for any solution $y$ of equation ($\ref{eqn:nonhomogeneous_linear_ode}$) and any solution $\tilde{y}$ of equation ($\ref{eqn:homogeneous_linear_ode}$) on interval $I$, the following holds:</p>\[L[y + \tilde{y}] = L[y] + L[\tilde{y}] = r + 0 = r.\]<h4 id="b">(b)</h4><p>For any two solutions $y$ and $y^*$ of equation ($\ref{eqn:nonhomogeneous_linear_ode}$) on interval $I$, the following holds:</p>\[L[y - y^*] = L[y] - L[y^*] = r - r = 0.\ \blacksquare\]<h2 id="the-general-solution-of-a-nonhomogeneous-ode-includes-all-solutions">The General Solution of a Nonhomogeneous ODE Includes All Solutions</h2><p>For a homogeneous ODE ($\ref{eqn:homogeneous_linear_ode}$), <a href="/posts/wronskian-existence-and-uniqueness-of-solutions/#the-general-solution-includes-all-solutions">we know that the general solution includes all solutions</a>. Let’s show that the same holds for the nonhomogeneous ODE ($\ref{eqn:nonhomogeneous_linear_ode}$).</p><blockquote class="prompt-info"><p><strong>Theorem 2: The General Solution of a Nonhomogeneous ODE Includes All Solutions</strong><br /> If the coefficients $p(x)$, $q(x)$, and the input function $r(x)$ of equation ($\ref{eqn:nonhomogeneous_linear_ode}$) are continuous on some open interval $I$, then every solution of equation ($\ref{eqn:nonhomogeneous_linear_ode}$) on $I$ can be obtained from the general solution ($\ref{eqn:general_sol}$) of equation ($\ref{eqn:nonhomogeneous_linear_ode}$) on $I$ by assigning suitable values to the arbitrary constants $c_1$ and $c_2$ in $y_h$.</p></blockquote><h3 id="proof-1">Proof</h3><p>Let $y^*$ be any solution of equation ($\ref{eqn:nonhomogeneous_linear_ode}$) on $I$, and let $x_0$ be any $x$ in the interval $I$. By the <a href="/posts/wronskian-existence-and-uniqueness-of-solutions/#existence-of-a-general-solution">theorem on the Existence of a General Solution for homogeneous ODEs with continuous variable coefficients</a>, $y_h = c_1y_1 + c_2y_2$ exists. Also, by the <strong>method of variation of parameters</strong>, which we will discuss later, $y_p$ also exists. Therefore, the general solution ($\ref{eqn:general_sol}$) of the nonhomogeneous ODE ($\ref{eqn:nonhomogeneous_linear_ode}$) exists on the interval $I$. Now, by Theorem <a href="#relationship-between-solutions-of-nonhomogeneous-and-corresponding-homogeneous-odes">1(b)</a> which we proved earlier, $Y = y^* - y_p$ is a solution of the homogeneous ODE ($\ref{eqn:homogeneous_linear_ode}$) on interval $I$, and at $x_0$,</p>\[\begin{gather*} Y(x_0) = y^*(x_0) - y_p(x_0) \\ Y^{\prime}(x_0) = {y^*}^{\prime}(x_0) - y_p^{\prime}(x_0) \end{gather*}\]<p>According to the <a href="/posts/wronskian-existence-and-uniqueness-of-solutions/#existence-and-uniqueness-theorem-for-initial-value-problems">Existence and Uniqueness Theorem for Initial Value Problems</a>, for the initial conditions above, there exists a unique particular solution $Y$ of the homogeneous ODE ($\ref{eqn:homogeneous_linear_ode}$) on interval $I$, which can be obtained by assigning suitable values to $c_1$ and $c_2$ in $y_h$. Since $y^* = Y + y_p$, we have shown that any particular solution $y^*$ of the nonhomogeneous ODE ($\ref{eqn:nonhomogeneous_linear_ode}$) can be obtained from the general solution ($\ref{eqn:general_sol}$). $\blacksquare$</p>]]> </content> </entry> <entry><title xml:lang="en">The Wronskian, Existence and Uniqueness of Solutions</title><link href="https://www.yunseo.kim/posts/wronskian-existence-and-uniqueness-of-solutions/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/wronskian-existence-and-uniqueness-of-solutions/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/wronskian-existence-and-uniqueness-of-solutions/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/wronskian-existence-and-uniqueness-of-solutions/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/wronskian-existence-and-uniqueness-of-solutions/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/wronskian-existence-and-uniqueness-of-solutions/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/wronskian-existence-and-uniqueness-of-solutions/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/wronskian-existence-and-uniqueness-of-solutions/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/wronskian-existence-and-uniqueness-of-solutions/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/wronskian-existence-and-uniqueness-of-solutions/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-04-06T00:00:00+09:00</published> <updated>2025-07-11T21:22:11+09:00</updated> <id>https://www.yunseo.kim/posts/wronskian-existence-and-uniqueness-of-solutions/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Differential Equation" /> <summary xml:lang="en">Explore the existence and uniqueness of solutions for second-order homogeneous linear ODEs with continuous variable coefficients. Learn to use the Wronskian to test for linear independence and see why these equations always have a general solution that encompasses all possible solutions.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Explore the existence and uniqueness of solutions for second-order homogeneous linear ODEs with continuous variable coefficients. Learn to use the Wronskian to test for linear independence and see why these equations always have a general solution that encompasses all possible solutions.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="tldr">TL;DR</h2><blockquote class="prompt-info"><p>For a second-order homogeneous linear ordinary differential equation with continuous variable coefficients $p$ and $q$ on an interval $I$</p>\[y^{\prime\prime} + p(x)y^{\prime} + q(x)y = 0\]<p>and initial conditions</p>\[y(x_0)=K_0, \qquad y^{\prime}(x_0)=K_1\]<p>the following four theorems hold.</p><ol><li><strong>Existence and Uniqueness Theorem for Initial Value Problems</strong>: The initial value problem consisting of the given equation and initial conditions has a unique solution $y(x)$ on the interval $I$.<li><strong>Test for Linear Dependence/Independence using the Wronskian</strong>: For two solutions $y_1$ and $y_2$ of the equation, if there exists an $x_0$ in the interval $I$ where the <strong>Wronskian</strong> $W(y_1, y_2) = y_1y_2^{\prime} - y_2y_1^{\prime}$ is $0$, then the two solutions are linearly dependent. Furthermore, if there exists an $x_1$ in the interval $I$ where $W\neq 0$, then the two solutions are linearly independent.<li><strong>Existence of a General Solution</strong>: The given equation has a general solution on the interval $I$.<li><strong>Nonexistence of Singular Solutions</strong>: This general solution includes all solutions of the equation (i.e., no singular solutions exist).</ol></blockquote><h2 id="prerequisites">Prerequisites</h2><ul><li><a href="/posts/Solution-of-First-Order-Linear-ODE/">Solution of First-Order Linear ODEs</a><li><a href="/posts/homogeneous-linear-odes-of-second-order/">Homogeneous Linear ODEs of Second Order</a><li><a href="/posts/homogeneous-linear-odes-with-constant-coefficients/">Homogeneous Linear ODEs with Constant Coefficients</a><li><a href="/posts/euler-cauchy-equation/">Euler-Cauchy Equation</a><li>Inverse Matrix, Singular Matrix, and Determinant</ul><h2 id="homogeneous-linear-odes-with-continuous-variable-coefficients">Homogeneous Linear ODEs with Continuous Variable Coefficients</h2><p>Previously, we examined the general solutions of <a href="/posts/homogeneous-linear-odes-with-constant-coefficients/">Homogeneous Linear ODEs with Constant Coefficients</a> and the <a href="/posts/euler-cauchy-equation/">Euler-Cauchy Equation</a>. In this article, we extend the discussion to a more general case: a second-order homogeneous linear ordinary differential equation with arbitrary continuous <strong>variable coefficients</strong> $p$ and $q$.</p>\[y^{\prime\prime} + p(x)y^{\prime} + q(x)y = 0 \label{eqn:homogeneous_linear_ode_with_var_coefficients}\tag{1}\]<p>We will investigate the existence and form of the general solution for this equation. Additionally, we will explore the uniqueness of the solution to the <a href="/posts/homogeneous-linear-odes-of-second-order/#initial-value-problem-and-initial-conditions">Initial Value Problem</a> composed of the ODE ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) and the following two initial conditions:</p>\[y(x_0)=K_0, \qquad y^{\prime}(x_0)=K_1 \label{eqn:initial_conditions}\tag{2}\]<p>To state the conclusion upfront, the core of this discussion is that a <u>linear</u> ordinary differential equation with continuous coefficients does not have a <em>singular solution</em> (a solution that cannot be obtained from the general solution).</p><h2 id="existence-and-uniqueness-theorem-for-initial-value-problems">Existence and Uniqueness Theorem for Initial Value Problems</h2><blockquote class="prompt-info"><p><strong>Existence and Uniqueness Theorem for Initial Value Problems</strong><br /> If $p(x)$ and $q(x)$ are continuous functions on some open interval $I$, and $x_0$ is in $I$, then the initial value problem consisting of Eqs. ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) and ($\ref{eqn:initial_conditions}$) has a unique solution $y(x)$ on the interval $I$.</p></blockquote><p>The proof of existence will not be covered here; we will only look at the proof of uniqueness. Proving uniqueness is typically simpler than proving existence.<br /> If you are not interested in the proof, you may skip this section and proceed to <a href="#linear-dependence-and-independence-of-solutions">Linear Dependence and Independence of Solutions</a>.</p><h3 id="proof-of-uniqueness">Proof of Uniqueness</h3><p>Let’s assume that the initial value problem consisting of the ODE ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) and initial conditions ($\ref{eqn:initial_conditions}$) has two solutions, $y_1(x)$ and $y_2(x)$, on the interval $I$. If we can show that their difference</p>\[y(x) = y_1(x) - y_2(x)\]<p>is identically zero on the interval $I$, this implies that $y_1 \equiv y_2$ on $I$, which means the solution is unique.</p><p>Since Eq. ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) is a homogeneous linear ODE, the linear combination $y$ of $y_1$ and $y_2$ is also a solution to the equation on $I$. Since $y_1$ and $y_2$ satisfy the same initial conditions ($\ref{eqn:initial_conditions}$), $y$ satisfies the conditions</p>\[\begin{align*} &amp; y(x_0) = y_1(x_0) - y_2(x_0) = 0, \\ &amp; y^{\prime}(x_0) = y_1^{\prime}(x_0) - y_2^{\prime}(x_0) = 0 \end{align*} \label{eqn:initial_conditions_*}\tag{3}\]<p>Now, consider the function</p>\[z(x) = y(x)^2 + y^{\prime}(x)^2\]<p>and its derivative</p>\[z^{\prime} = 2yy^{\prime} + 2y^{\prime}y^{\prime\prime}\]<p>From the ODE, we have</p>\[y^{\prime\prime} = -py^{\prime} - qy\]<p>Substituting this into the expression for $z^{\prime}$ gives</p>\[z^{\prime} = 2yy^{\prime} - 2p{y^{\prime}}^2 - 2qyy^{\prime} \label{eqn:z_prime}\tag{4}\]<p>Now, since $y$ and $y^{\prime}$ are real,</p>\[(y\pm y^{\prime})^2 = y^2 \pm 2yy^{\prime} + {y^{\prime}}^2 \geq 0\]<p>From this and the definition of $z$, we can derive two inequalities:</p>\[(a)\ 2yy^{\prime} \leq y^2 + {y^{\prime}}^2 = z, \qquad (b)\ 2yy^{\prime} \geq -(y^2 + {y^{\prime}}^2) = -z \label{eqn:inequalities}\tag{5}\]<table><tbody><tr><td>From these two inequalities, we know that $<td>2yy^{\prime}<td>\leq z$. Thus, for the last term in Eq. ($\ref{eqn:z_prime}$), the following inequality holds:</table>\[\pm2qyy^{\prime} \leq |\pm 2qyy^{\prime}| = |q||2yy^{\prime}| \leq |q|z.\]<table><tbody><tr><td>Using this result, along with $-p \leq<td>p<td>$, and applying inequality ($\ref{eqn:inequalities}$a) to the term $2yy^{\prime}$ in Eq. ($\ref{eqn:z_prime}$), we get</table>\[z^{\prime} \leq z + 2|p|{y^{\prime}}^2 + |q|z\]<p>Since ${y^{\prime}}^2 \leq y^2 + {y^{\prime}}^2 = z$, this leads to</p>\[z^{\prime} \leq (1 + 2|p| + |q|)z\]<table><tbody><tr><td>Letting the function in the parenthesis be $h = 1 + 2<td>p<td>+<td>q<td>$, we have</table>\[z^{\prime} \leq hz \quad \forall x \in I \label{eqn:inequality_6a}\tag{6a}\]<p>In the same way, from Eqs. ($\ref{eqn:z_prime}$) and ($\ref{eqn:inequalities}$), we get</p>\[\begin{align*} -z^{\prime} &amp;= -2yy^{\prime} + 2p{y^{\prime}}^2 + 2qyy^{\prime} \\ &amp;\leq z + 2|p|z + |q|z = hz \end{align*} \label{eqn:inequality_6b}\tag{6b}\]<p>These two inequalities, ($\ref{eqn:inequality_6a}$) and ($\ref{eqn:inequality_6b}$), are equivalent to the following inequalities:</p>\[z^{\prime} - hz \leq 0, \qquad z^{\prime} + hz \geq 0 \label{eqn:inequalities_7}\tag{7}\]<p>The <a href="/posts/Solution-of-First-Order-Linear-ODE/#nonhomogeneous-linear-ordinary-differential-equations">integrating factors</a> for the left-hand sides of these two expressions are</p>\[F_1 = e^{-\int h(x)\ dx} \qquad \text{and} \qquad F_2 = e^{\int h(x)\ dx}\]<p>Since $h$ is continuous, the indefinite integral $\int h(x)\ dx$ exists. As $F_1$ and $F_2$ are positive, from ($\ref{eqn:inequalities_7}$) we obtain</p>\[F_1(z^{\prime} - hz) = (F_1 z)^{\prime} \leq 0, \qquad F_2(z^{\prime} + hz) = (F_2 z)^{\prime} \geq 0\]<p>This means that on the interval $I$, $F_1 z$ is non-increasing and $F_2 z$ is non-decreasing. By Eq. ($\ref{eqn:initial_conditions_*}$), we have $z(x_0) = 0$, so</p>\[\begin{cases} \left(F_1 z \geq (F_1 z)_{x_0} = 0\right)\ \&amp; \ \left(F_2 z \leq (F_2 z)_{x_0} = 0\right) &amp; (x \leq x_0) \\ \left(F_1 z \leq (F_1 z)_{x_0} = 0\right)\ \&amp; \ \left(F_2 z \geq (F_2 z)_{x_0} = 0\right) &amp; (x \geq x_0) \end{cases}\]<p>Finally, dividing both sides of the inequalities by the positive functions $F_1$ and $F_2$, we can show the uniqueness of the solution as follows:</p>\[(z \leq 0) \ \&amp; \ (z \geq 0) \quad \forall x \in I\] \[z = y^2 + {y^{\prime}}^2 = 0 \quad \forall x \in I\] \[\therefore y \equiv y_1 - y_2 \equiv 0 \quad \forall x \in I. \ \blacksquare\]<h2 id="linear-dependence-and-independence-of-solutions">Linear Dependence and Independence of Solutions</h2><p>Let’s briefly recall what we covered in <a href="/posts/homogeneous-linear-odes-of-second-order/#basis-and-general-solution">Second-Order Homogeneous Linear ODEs</a>. The general solution on an open interval $I$ is constructed from a <strong>basis</strong> $y_1$, $y_2$ on $I$, which is a pair of linearly independent solutions. Here, $y_1$ and $y_2$ being <strong>linearly independent</strong> on an interval $I$ means that for all $x$ in the interval, the following holds:</p>\[k_1y_1(x) + k_2y_2(x) = 0 \Leftrightarrow k_1=0\text{ and }k_2=0 \label{eqn:linearly_independent}\tag{8}\]<p>If the above is not satisfied, and $k_1y_1(x) + k_2y_2(x) = 0$ holds for at least one non-zero $k_1$ or $k_2$, then $y_1$ and $y_2$ are <strong>linearly dependent</strong> on the interval $I$. In this case, for all $x$ in the interval $I$,</p>\[\text{(a) } y_1 = ky_2 \quad \text{or} \quad \text{(b) } y_2 = ly_1 \label{eqn:linearly_dependent}\tag{9}\]<p>which means $y_1$ and $y_2$ are proportional.</p><p>Now let’s look at the following test for linear independence/dependence of solutions.</p><blockquote class="prompt-info"><p><strong>Test for Linear Dependence/Independence using the Wronskian</strong><br /> <strong>i.</strong> If the ODE ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) has continuous coefficients $p(x)$ and $q(x)$ on an open interval $I$, then a necessary and sufficient condition for two solutions $y_1$ and $y_2$ of Eq. ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) to be linearly dependent on $I$ is that their <em>Wronski determinant</em>, or simply <strong>Wronskian</strong>, which is the following determinant,</p>\[W(y_1, y_2) = \begin{vmatrix} y_1 &amp; y_2 \\ y_1^{\prime} &amp; y_2^{\prime} \\ \end{vmatrix} = y_1y_2^{\prime} - y_2y_1^{\prime} \label{eqn:wronskian}\tag{10}\]<p>is zero at some $x_0$ in the interval $I$.</p>\[\exists x_0 \in I: W(x_0)=0 \iff y_1 \text{ and } y_2 \text{ are linearly dependent}\]<p><strong>ii.</strong> If $W=0$ at a point $x=x_0$ in the interval $I$, then $W=0$ for all $x$ in the interval $I$.</p>\[\exists x_0 \in I: W(x_0)=0 \implies \forall x \in I: W(x)=0\]<p>In other words, if there exists an $x_1$ in the interval $I$ such that $W\neq 0$, then $y_1$ and $y_2$ are linearly independent on that interval $I$.</p>\[\begin{align*} \exists x_1 \in I: W(x_1)\neq 0 &amp;\implies \forall x \in I: W(x)\neq 0 \\ &amp;\implies y_1 \text{ and } y_2 \text{ are linearly independent} \end{align*}\]</blockquote><blockquote class="prompt-tip"><p>The Wronskian was first introduced by the Polish mathematician Józef Maria Hoene-Wroński and was named after him posthumously in 11882 HE by the Scottish mathematician Sir Thomas Muir.</p></blockquote><h3 id="proof">Proof</h3><h4 id="i-a">i. (a)</h4><p>Let $y_1$ and $y_2$ be linearly dependent on the interval $I$. Then, either Eq. ($\ref{eqn:linearly_dependent}$a) or ($\ref{eqn:linearly_dependent}$b) holds on $I$. If Eq. ($\ref{eqn:linearly_dependent}$a) holds, then</p>\[W(y_1, y_2) = y_1y_2^{\prime} - y_2y_1^{\prime} = ky_2y_2^{\prime} - y_2(ky_2^{\prime}) = 0\]<p>Similarly, if Eq. ($\ref{eqn:linearly_dependent}$b) holds, then</p>\[W(y_1, y_2) = y_1y_2^{\prime} - y_2y_1^{\prime} = y_1(ly_1^{\prime}) - ly_1y_1^{\prime} = 0\]<p>Thus, we can confirm that the Wronskian $W(y_1, y_2)=0$ <u>for all $x$ in the interval $I$</u>.</p><h4 id="i-b">i. (b)</h4><p>Conversely, suppose that $W(y_1, y_2)=0$ for some $x = x_0$. We will show that $y_1$ and $y_2$ are linearly dependent on the interval $I$. Consider the system of linear equations for the unknowns $k_1$, $k_2$:</p>\[\begin{gather*} k_1y_1(x_0) + k_2y_2(x_0) = 0 \\ k_1y_1^{\prime}(x_0) + k_2y_2^{\prime}(x_0) = 0 \end{gather*} \label{eqn:linear_system}\tag{11}\]<p>This can be expressed in the form of a vector equation:</p>\[\left[\begin{matrix} y_1(x_0) &amp; y_2(x_0) \\ y_1^{\prime}(x_0) &amp; y_2^{\prime}(x_0) \end{matrix}\right] \left[\begin{matrix} k_1 \\ k_2 \end{matrix}\right] = 0 \label{eqn:vector_equation}\tag{12}\]<p>The coefficient matrix of this vector equation is</p>\[A = \left[\begin{matrix} y_1(x_0) &amp; y_2(x_0) \\ y_1^{\prime}(x_0) &amp; y_2^{\prime}(x_0) \end{matrix}\right]\]<p>and the determinant of this matrix is $W(y_1(x_0), y_2(x_0))$. Since $\det(A) = W=0$, $A$ is a <strong>singular matrix</strong> that does not have an <strong>inverse matrix</strong>. Therefore, the system of equations ($\ref{eqn:linear_system}$) has a non-trivial solution $(c_1, c_2)$ other than the zero vector $(0,0)$, where at least one of $k_1$ and $k_2$ is not zero. Now, let’s introduce the function</p>\[y(x) = c_1y_1(x) + c_2y_2(x)\]<p>Since Eq. ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) is homogeneous and linear, by the <a href="/posts/homogeneous-linear-odes-of-second-order/#superposition-principle">Superposition Principle</a>, this function is a solution of ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) on the interval $I$. From Eq. ($\ref{eqn:linear_system}$), we can see that this solution satisfies the initial conditions $y(x_0)=0$, $y^{\prime}(x_0)=0$.</p><p>Meanwhile, there exists a trivial solution $y^* \equiv 0$ that satisfies the same initial conditions $y^*(x_0)=0$, ${y^*}^{\prime}(x_0)=0$. Since the coefficients $p$ and $q$ of Eq. ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) are continuous, the uniqueness of the solution is guaranteed by the <a href="#existence-and-uniqueness-theorem-for-initial-value-problems">Existence and Uniqueness Theorem for Initial Value Problems</a>. Therefore, $y \equiv y^*$. That is, on the interval $I$,</p>\[c_1y_1 + c_2y_2 \equiv 0\]<p>Since at least one of $c_1$ and $c_2$ is not zero, this does not satisfy ($\ref{eqn:linearly_independent}$), which means that $y_1$ and $y_2$ are linearly dependent on the interval $I$.</p><h4 id="ii">ii.</h4><p>If $W(x_0)=0$ at some point $x_0$ in the interval $I$, then by <a href="#i-b">i.(b)</a>, $y_1$ and $y_2$ are linearly dependent on the interval $I$. Then, by <a href="#i-a">i.(a)</a>, $W\equiv 0$. Therefore, if there is even one point $x_1$ in the interval $I$ where $W(x_1)\neq 0$, then $y_1$ and $y_2$ are linearly independent. $\blacksquare$</p><h2 id="the-general-solution-includes-all-solutions">The General Solution Includes All Solutions</h2><h3 id="existence-of-a-general-solution">Existence of a General Solution</h3><blockquote class="prompt-info"><p>If $p(x)$ and $q(x)$ are continuous on an open interval $I$, then the equation ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) has a general solution on the interval $I$.</p></blockquote><h4 id="proof-1">Proof</h4><p>By the <a href="#existence-and-uniqueness-theorem-for-initial-value-problems">Existence and Uniqueness Theorem for Initial Value Problems</a>, the ODE ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) has a solution $y_1(x)$ on the interval $I$ that satisfies the initial conditions</p>\[y_1(x_0) = 1, \qquad y_1^{\prime}(x_0) = 0\]<p>and a solution $y_2(x)$ on the interval $I$ that satisfies the initial conditions</p>\[y_2(x_0) = 0, \qquad y_2^{\prime}(x_0) = 1\]<p>The Wronskian of these two solutions at $x=x_0$ has a non-zero value:</p>\[W(y_1(x_0), y_2(x_0)) = y_1(x_0)y_2^{\prime}(x_0) - y_2(x_0)y_1^{\prime}(x_0) = 1\cdot 1 - 0\cdot 0 = 1\]<p>Therefore, by the <a href="#linear-dependence-and-independence-of-solutions">Test for Linear Dependence/Independence using the Wronskian</a>, $y_1$ and $y_2$ are linearly independent on the interval $I$. Thus, these two solutions form a basis of solutions for Eq. ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) on the interval $I$, and a general solution $y = c_1y_1 + c_2y_2$ with arbitrary constants $c_1$, $c_2$ must exist on the interval $I$. $\blacksquare$</p><h3 id="nonexistence-of-singular-solutions">Nonexistence of Singular Solutions</h3><blockquote class="prompt-info"><p>If the ODE ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) has continuous coefficients $p(x)$ and $q(x)$ on some open interval $I$, then every solution $y=Y(x)$ of Eq. ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) on the interval $I$ is of the form</p>\[Y(x) = C_1y_1(x) + C_2y_2(x) \label{eqn:particular_solution}\tag{13}\]<p>where $y_1$, $y_2$ form a basis of solutions for Eq. ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) on the interval $I$, and $C_1$, $C_2$ are suitable constants.<br /> That is, Eq. ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) does not have a <strong>singular solution</strong>, which is a solution that cannot be obtained from the general solution.</p></blockquote><h4 id="proof-2">Proof</h4><p>Let $y=Y(x)$ be any solution of Eq. ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) on the interval $I$. Now, by the <a href="#existence-of-a-general-solution">Existence of a General Solution theorem</a>, the ODE ($\ref{eqn:homogeneous_linear_ode_with_var_coefficients}$) has a general solution on the interval $I$:</p>\[y(x) = c_1y_1(x) + c_2y_2(x) \label{eqn:general_solution}\tag{14}\]<p>Now we must show that for any $Y(x)$, there exist constants $c_1$, $c_2$ such that $y(x)=Y(x)$ on the interval $I$. Let’s first show that we can find values for $c_1$, $c_2$ such that for an arbitrary $x_0$ in $I$, we have $y(x_0)=Y(x_0)$ and $y^{\prime}(x_0)=Y^{\prime}(x_0)$. From Eq. ($\ref{eqn:general_solution}$), we get</p>\[\begin{gather*} \left[\begin{matrix} y_1(x_0) &amp; y_2(x_0) \\ y_1^{\prime}(x_0) &amp; y_2^{\prime}(x_0) \end{matrix}\right] \left[\begin{matrix} c_1 \\ c_2 \end{matrix}\right] = \left[\begin{matrix} Y(x_0) \\ Y^{\prime}(x_0) \end{matrix}\right] \end{gather*} \label{eqn:vector_equation_2}\tag{15}\]<p>Since $y_1$ and $y_2$ form a basis, the determinant of the coefficient matrix, which is the Wronskian $W(y_1(x_0), y_2(x_0))$, is non-zero. Therefore, Eq. ($\ref{eqn:vector_equation_2}$) can be solved for $c_1$ and $c_2$. Let the solution be $(c_1, c_2) = (C_1, C_2)$. Substituting this into Eq. ($\ref{eqn:general_solution}$) gives the following particular solution:</p>\[y^*(x) = C_1y_1(x) + C_2y_2(x).\]<p>Since $C_1$, $C_2$ are the solution to Eq. ($\ref{eqn:vector_equation_2}$),</p>\[y^*(x_0) = Y(x_0), \qquad {y^*}^{\prime}(x_0) = Y^{\prime}(x_0)\]<p>By the uniqueness part of the <a href="#existence-and-uniqueness-theorem-for-initial-value-problems">Existence and Uniqueness Theorem for Initial Value Problems</a>, we have $y^* \equiv Y$ for all $x$ in the interval $I$. $\blacksquare$</p>]]> </content> </entry> <entry><title xml:lang="en">Euler-Cauchy Equation</title><link href="https://www.yunseo.kim/posts/euler-cauchy-equation/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/euler-cauchy-equation/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/euler-cauchy-equation/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/euler-cauchy-equation/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/euler-cauchy-equation/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/euler-cauchy-equation/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/euler-cauchy-equation/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/euler-cauchy-equation/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/euler-cauchy-equation/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/euler-cauchy-equation/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-03-28T00:00:00+09:00</published> <updated>2025-07-09T19:24:14+09:00</updated> <id>https://www.yunseo.kim/posts/euler-cauchy-equation/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Differential Equation" /> <summary xml:lang="en">Explore the different forms of the general solution to the Euler-Cauchy equation based on the sign of the discriminant of its auxiliary equation.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Explore the different forms of the general solution to the Euler-Cauchy equation based on the sign of the discriminant of its auxiliary equation.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="tldr">TL;DR</h2><blockquote class="prompt-info"><ul><li>Euler-Cauchy equation: $x^2y^{\prime\prime} + axy^{\prime} + by = 0$<li><strong>Auxiliary equation</strong>: $m^2 + (a-1)m + b = 0$<li>The form of the general solution can be divided into three cases, as shown in the table, depending on the sign of the discriminant $(1-a)^2 - 4b$ of the auxiliary equation.</ul><table><thead><tr><th style="text-align: center">Case<th style="text-align: center">Roots of Auxiliary Equation<th style="text-align: center">Basis of Solutions for Euler-Cauchy Equation<th style="text-align: center">General Solution of Euler-Cauchy Equation<tbody><tr><td style="text-align: center">I<td style="text-align: center">Distinct real roots<br />$m_1$, $m_2$<td style="text-align: center">$x^{m_1}$, $x^{m_2}$<td style="text-align: center">$y = c_1 x^{m_1} + c_2 x^{m_2}$<tr><td style="text-align: center">II<td style="text-align: center">Real double root<br /> $m = \cfrac{1-a}{2}$<td style="text-align: center">$x^{(1-a)/2}$, $x^{(1-a)/2}\ln{x}$<td style="text-align: center">$y = (c_1 + c_2 \ln x)x^m$<tr><td style="text-align: center">III<td style="text-align: center">Complex conjugate roots<br /> $m_1 = \cfrac{1}{2}(1-a) + i\omega$, <br /> $m_2 = \cfrac{1}{2}(1-a) - i\omega$<td style="text-align: center">$x^{(1-a)/2}\cos{(\omega \ln{x})}$, <br /> $x^{(1-a)/2}\sin{(\omega \ln{x})}$<td style="text-align: center">$y = x^{(1-a)/2}[A\cos{(\omega \ln{x})} + B\sin{(\omega \ln{x})}]$</table></blockquote><h2 id="prerequisites">Prerequisites</h2><ul><li><a href="/posts/homogeneous-linear-odes-of-second-order/">Homogeneous Linear ODEs of Second Order</a><li><a href="/posts/homogeneous-linear-odes-with-constant-coefficients/">Homogeneous Linear ODEs with Constant Coefficients</a><li>Euler’s Formula</ul><h2 id="auxiliary-equation">Auxiliary Equation</h2><p>The <strong>Euler-Cauchy equation</strong> is an ordinary differential equation of the form</p>\[x^2y^{\prime\prime} + axy^{\prime} + by = 0 \label{eqn:euler_cauchy_eqn}\tag{1}\]<p>with given constants $a$ and $b$, and an unknown function $y(x)$. Substituting</p>\[y=x^m, \qquad y^{\prime}=mx^{m-1}, \qquad y^{\prime\prime}=m(m-1)x^{m-2}\]<p>into Eq. ($\ref{eqn:euler_cauchy_eqn}$) gives</p>\[x^2m(m-1)x^{m-2} + axmx^{m-1} + bx^m = 0,\]<p>which simplifies to</p>\[[m(m-1) + am + b]x^m = 0\]<p>From this, we obtain the auxiliary equation</p>\[m^2 + (a-1)m + b = 0 \label{eqn:auxiliary_eqn}\tag{2}\]<p>and the necessary and sufficient condition for $y=x^m$ to be a solution of the Euler-Cauchy equation ($\ref{eqn:euler_cauchy_eqn}$) is that $m$ is a root of the auxiliary equation ($\ref{eqn:auxiliary_eqn}$).</p><p>Solving the quadratic equation ($\ref{eqn:auxiliary_eqn}$) gives the roots</p>\[\begin{align*} m_1 &amp;= \frac{1}{2}\left[(1-a) + \sqrt{(1-a)^2 - 4b} \right], \\ m_2 &amp;= \frac{1}{2}\left[(1-a) - \sqrt{(1-a)^2 - 4b} \right] \end{align*}\label{eqn:m1_and_m2}\tag{3}\]<p>and from this, the two functions</p>\[y_1 = x^{m_1}, \quad y_2 = x^{m_2}\]<p>are solutions to equation ($\ref{eqn:euler_cauchy_eqn}$).</p><p>As with <a href="/posts/homogeneous-linear-odes-with-constant-coefficients/">Homogeneous Linear ODEs with Constant Coefficients</a>, we can divide this into three cases based on the sign of the discriminant $(1-a)^2 - 4b$ of the auxiliary equation ($\ref{eqn:auxiliary_eqn}$).</p><ul><li>$(1-a)^2 - 4b &gt; 0$: Distinct real roots<li>$(1-a)^2 - 4b = 0$: Real double root<li>$(1-a)^2 - 4b &lt; 0$: Complex conjugate roots</ul><h2 id="forms-of-the-general-solution-based-on-the-sign-of-the-discriminant">Forms of the General Solution Based on the Sign of the Discriminant</h2><h3 id="i-distinct-real-roots-m_1-and-m_2">I. Distinct Real Roots $m_1$ and $m_2$</h3><p>In this case, a basis of solutions for equation ($\ref{eqn:euler_cauchy_eqn}$) on any interval is</p>\[y_1 = x^{m_1}, \quad y_2 = x^{m_2}\]<p>and the corresponding general solution is</p>\[y = c_1 x^{m_1} + c_2 x^{m_2} \label{eqn:general_sol_1}\tag{4}\]<h3 id="ii-real-double-root-m--cfrac1-a2">II. Real Double Root $m = \cfrac{1-a}{2}$</h3><p>When $(1-a)^2 - 4b = 0$, i.e., $b=\cfrac{(1-a)^2}{4}$, the quadratic equation ($\ref{eqn:auxiliary_eqn}$) has only one root $m = m_1 = m_2 = \cfrac{1-a}{2}$. Therefore, the one solution of the form $y = x^m$ we can obtain is</p>\[y_1 = x^{(1-a)/2}\]<p>and the Euler-Cauchy equation ($\ref{eqn:euler_cauchy_eqn}$) takes the form</p>\[y^{\prime\prime} + \frac{a}{x}y^{\prime} + \frac{(1-a)^2}{4x^2}y = 0 \label{eqn:standard_form}\tag{5}\]<p>Now, let’s find another linearly independent solution $y_2$ using <a href="/posts/homogeneous-linear-odes-of-second-order/#reduction-of-order">reduction of order</a>.</p><p>If we set the second solution we are looking for as $y_2=uy_1$, we get</p>\[u = \int U, \qquad U = \frac{1}{y_1^2}\exp\left(-\int \frac{a}{x}\ dx \right)\]<p>Since $\exp \left(-\int \cfrac{a}{x}\ dx \right) = \exp (-a\ln x) = \exp(\ln{x^{-a}}) = x^{-a}$,</p>\[U = \frac{x^{-a}}{y_1^2} = \frac{x^{-a}}{x^{(1-a)}} = \frac{1}{x}\]<p>and integrating gives $u = \ln x$.</p><p>Therefore, $y_2 = uy_1 = y_1 \ln x$, and since their quotient is not a constant, $y_1$ and $y_2$ are linearly independent. The general solution corresponding to the basis $y_1$ and $y_2$ is</p>\[y = (c_1 + c_2 \ln x)x^m \label{eqn:general_sol_2}\tag{6}\]<h3 id="iii-complex-conjugate-roots">III. Complex Conjugate Roots</h3><p>In this case, the roots of the auxiliary equation ($\ref{eqn:auxiliary_eqn}$) are $m = \cfrac{1}{2}(1-a) \pm i\sqrt{b - \frac{1}{4}(1-a)^2}$, and the corresponding two complex solutions of the Euler-Cauchy equation ($\ref{eqn:euler_cauchy_eqn}$) can be written as follows, using the fact that $x=e^{\ln x}$.</p>\[\begin{align*} x^{m_1} &amp;= x^{(1-a)/2 + i\sqrt{b - \frac{1}{4}(1-a)^2}} \\ &amp;= x^{(1-a)/2}(e^{\ln x})^{i\sqrt{b - \frac{1}{4}(1-a)^2}} \\ &amp;= x^{(1-a)/2}e^{i(\sqrt{b - \frac{1}{4}(1-a)^2}\ln x)}, \\ x^{m_2} &amp;= x^{(1-a)/2 - i\sqrt{b - \frac{1}{4}(1-a)^2}} \\ &amp;= x^{(1-a)/2}(e^{\ln x})^{-i\sqrt{b - \frac{1}{4}(1-a)^2}} \\ &amp;= x^{(1-a)/2}e^{i(-\sqrt{b - \frac{1}{4}(1-a)^2}\ln x)}. \end{align*} \tag{7}\]<p>By setting $t=\sqrt{b - \frac{1}{4}(1-a)^2}\ln x$ and using Euler’s formula $e^{it} = \cos{t} + i\sin{t}$, we can see that</p>\[\begin{align*} x^{m_1} &amp;= x^{(1-a)/2}\left[\cos\left(\sqrt{b - \tfrac{1}{4}(1-a)^2}\ln x \right) + i\sin\left(\sqrt{b - \tfrac{1}{4}(1-a)^2}\ln x \right) \right], \\ x^{m_2} &amp;= x^{(1-a)/2}\left[\cos\left(\sqrt{b - \tfrac{1}{4}(1-a)^2}\ln x \right) - i\sin\left(\sqrt{b - \tfrac{1}{4}(1-a)^2}\ln x \right) \right] \end{align*} \tag{8}\]<p>and from this, we obtain the following two real solutions</p>\[\begin{align*} \frac{x^{m_1} + x^{m_2}}{2} &amp;= x^{(1-a)/2}\cos\left(\sqrt{b - \tfrac{1}{4}(1-a)^2}\ln x \right), \\ \frac{x^{m_1} - x^{m_2}}{2i} &amp;= x^{(1-a)/2}\sin\left(\sqrt{b - \tfrac{1}{4}(1-a)^2}\ln x \right) \end{align*} \tag{9}\]<p>Since their quotient $\cos\left(\sqrt{b - \frac{1}{4}(1-a)^2}\ln x \right)$ is not a constant, the two solutions above are linearly independent and thus form a basis for the Euler-Cauchy equation ($\ref{eqn:euler_cauchy_eqn}$) by the <a href="/posts/homogeneous-linear-odes-of-second-order/#superposition-principle">superposition principle</a>. From this, we obtain the following real general solution.</p>\[y = x^{(1-a)/2} \left[ A\cos\left(\sqrt{b - \tfrac{1}{4}(1-a)^2}\ln x \right) + B\sin\left(\sqrt{b - \tfrac{1}{4}(1-a)^2}\ln x \right) \right]. \label{eqn:general_sol_3}\tag{10}\]<p>However, the case where the auxiliary equation of an Euler-Cauchy equation has complex conjugate roots is not of great practical importance.</p><h2 id="transformation-to-a-homogeneous-linear-ode-with-constant-coefficients">Transformation to a Homogeneous Linear ODE with Constant Coefficients</h2><p>The Euler-Cauchy equation can be transformed into a <a href="/posts/homogeneous-linear-odes-with-constant-coefficients/">homogeneous linear ODE with constant coefficients</a> through a change of variables.</p><p>By substituting $x = e^t$, we get</p>\[\frac{d}{dx} = \frac{1}{x}\frac{d}{dt}, \quad \frac{d^2}{dx^2} = \frac{1}{x^2}\left(\frac{d^2}{dt^2} - \frac{d}{dt} \right)\]<p>and the Euler-Cauchy equation ($\ref{eqn:euler_cauchy_eqn}$) is transformed into the following homogeneous linear ODE with constant coefficients in terms of $t$.</p>\[y^{\prime\prime}(t) + (a-1)y^{\prime}(t) + by(t) = 0. \label{eqn:substituted}\tag{11}\]<p>If we solve equation ($\ref{eqn:substituted}$) for $t$ by applying the solution method for <a href="/posts/homogeneous-linear-odes-with-constant-coefficients/">homogeneous linear ODEs with constant coefficients</a>, and then transform the resulting solution back into a solution in terms of $x$ using $t = \ln{x}$, we obtain <a href="#forms-of-the-general-solution-based-on-the-sign-of-the-discriminant">the same results as seen before</a>.</p>]]> </content> </entry> <entry><title xml:lang="en">Testing for Convergence or Divergence of a Series</title><link href="https://www.yunseo.kim/posts/testing-for-convergence-or-divergence-of-a-series/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/testing-for-convergence-or-divergence-of-a-series/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/testing-for-convergence-or-divergence-of-a-series/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/testing-for-convergence-or-divergence-of-a-series/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/testing-for-convergence-or-divergence-of-a-series/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/testing-for-convergence-or-divergence-of-a-series/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/testing-for-convergence-or-divergence-of-a-series/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/testing-for-convergence-or-divergence-of-a-series/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/testing-for-convergence-or-divergence-of-a-series/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/testing-for-convergence-or-divergence-of-a-series/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-03-18T00:00:00+09:00</published> <updated>2025-05-13T16:24:07+09:00</updated> <id>https://www.yunseo.kim/posts/testing-for-convergence-or-divergence-of-a-series/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Calculus" /> <summary xml:lang="en">A comprehensive examination of various methods for determining whether a series converges or diverges.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>A comprehensive examination of various methods for determining whether a series converges or diverges.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="tldr">TL;DR</h2><blockquote class="prompt-info"><ul><li><strong>$n$th-term test for divergence</strong>: $\lim_{n\to\infty} a_n \neq 0 \Rightarrow \text{series }\sum a_n \text{ diverges}$<li><strong>Convergence/divergence of geometric series</strong>: The geometric series $\sum ar^{n-1}$<ul><li>converges if $|r| &lt; 1$<li>diverges if $|r| \geq 1$</ul><li><strong>Convergence/divergence of $p$-series</strong>: The $p$-series $\sum \cfrac{1}{n^p}$<ul><li>converges if $p&gt;1$<li>diverges if $p\leq 1$</ul><li><strong>Comparison Test</strong>: If $0 \leq a_n \leq b_n$, then<ul><li>$\sum b_n &lt; \infty \ \Rightarrow \ \sum a_n &lt; \infty$<li>$\sum a_n = \infty \ \Rightarrow \ \sum b_n = \infty$</ul><li><strong>Limit Comparison Test</strong>: If $\lim_{n\to\infty} \frac{a_n}{b_n} = c \text{ (}c\text{ is a finite positive number)}$, then both series $\sum a_n$ and $\sum b_n$ either both converge or both diverge<li>For a series of positive terms $\sum a_n$ and a positive number $\epsilon &lt; 1$<ul><li>If $\sqrt[n]{a_n}&lt; 1-\epsilon$ for all $n$, then the series $\sum a_n$ converges<li>If $\sqrt[n]{a_n}&gt; 1+\epsilon$ for all $n$, then the series $\sum a_n$ diverges</ul><li><strong>Root Test</strong>: For a series of positive terms $\sum a_n$, if the limit $\lim_{n\to\infty} \sqrt[n]{a_n} =: r$ exists, then<ul><li>the series $\sum a_n$ converges if $r&lt;1$<li>the series $\sum a_n$ diverges if $r&gt;1$</ul><li><strong>Ratio Test</strong>: For a sequence of positive terms $(a_n)$ and $0 &lt; r &lt; 1$<ul><li>If $a_{n+1}/a_n \leq r$ for all $n$, then the series $\sum a_n$ converges<li>If $a_{n+1}/a_n \geq 1$ for all $n$, then the series $\sum a_n$ diverges</ul><li>For a sequence of positive numbers $(a_n)$, if the limit $\rho := \lim_{n\to\infty} \cfrac{a_{n+1}}{a_n}$ exists, then<ul><li>the series $\sum a_n$ converges if $\rho &lt; 1$<li>the series $\sum a_n$ diverges if $\rho &gt; 1$</ul><li><strong>Integral Test</strong>: For a continuous, decreasing function $f: \left[1,\infty \right) \rightarrow \mathbb{R}$ with $f(x)&gt;0$ for all $x$, the series $\sum f(n)$ converges if and only if the integral $\int_1^\infty f(x)\ dx := \lim_{b\to\infty} \int_1^b f(x)\ dx$ converges<li><strong>Alternating Series Test</strong>: An alternating series $\sum a_n$ converges if the following conditions are satisfied:<ol><li>$a_n$ and $a_{n+1}$ have opposite signs for all $n$<li>$|a_n| \geq |a_{n+1}|$ for all $n$<li>$\lim_{n\to\infty} a_n = 0$</ol><li>A series that converges absolutely also converges. The converse is not true.</ul></blockquote><h2 id="prerequisites">Prerequisites</h2><ul><li><a href="/posts/sequences-and-series/">Sequences and Series</a></ul><h2 id="introduction">Introduction</h2><p>In the previous post on <a href="/posts/sequences-and-series/#convergence-and-divergence-of-series">Sequences and Series</a>, we covered the definitions of convergence and divergence of series. In this post, we will summarize various methods for determining whether a series converges or diverges. Generally, testing for convergence or divergence of a series is much easier than finding the exact sum of the series.</p><h2 id="the-nth-term-test">The $n$th-Term Test</h2><p>For a series $\sum a_n$, we call $a_n$ the <strong>general term</strong> of the series.</p><p>The following theorem allows us to easily identify some obviously divergent series, making it a wise first step when testing for convergence or divergence to avoid wasting time.</p><blockquote class="prompt-info"><p><strong>$n$th-term test for divergence</strong><br /> If a series $\sum a_n$ converges, then</p>\[\lim_{n\to\infty} a_n=0\]<p>That is,</p>\[\lim_{n\to\infty} a_n \neq 0 \Rightarrow \text{series }\sum a_n \text{ diverges}\]</blockquote><h3 id="proof">Proof</h3><p>Let $l$ be the sum of a convergent series $\sum a_n$ and define the partial sum of the first $n$ terms as</p>\[s_n := a_1 + a_2 + \cdots + a_n\]<p>Then,</p>\[\forall \epsilon &gt; 0,\, \exists N \in \mathbb{N}\ (n &gt; N \Rightarrow |s_n - l| &lt; \epsilon).\]<p>Therefore, for sufficiently large $n$ (where $n &gt; N$),</p>\[|a_n| = |s_n - s_{n-1}| = |(s_n - l) - (s_{n-1} - l)| \leq |s_n - l| + |s_{n-1} - l| \leq \epsilon + \epsilon = 2\epsilon\]<p>From the definition of convergence of a sequence,</p>\[\lim_{n\to\infty} |a_n| = 0. \quad \blacksquare\]<h3 id="caution">Caution</h3><p>The converse of this theorem is generally not true. A classic example that demonstrates this is the <strong>harmonic series</strong>.</p><p>The harmonic series is a series whose terms are the reciprocals of an <strong>arithmetic sequence</strong>, forming a <strong>harmonic sequence</strong>. The most well-known harmonic series is</p>\[H_n := 1 + \frac{1}{2} + \cdots + \frac{1}{n} \quad (n=1,2,3,\dots)\]<p>This series diverges, as can be shown by:</p>\[\begin{align*} \lim_{n\to\infty} H_n &amp;= 1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \frac{1}{6} + \frac{1}{7} + \frac{1}{8} + \frac{1}{9} + \cdots + \frac{1}{16} + \cdots \\ &amp;&gt; 1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{4} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{16} + \cdots + \frac{1}{16} + \cdots \\ &amp;= 1 + \frac{1}{2} \qquad\, + \frac{1}{2} \qquad\qquad\qquad\ \ + \frac{1}{2} \qquad\qquad\quad + \frac{1}{2} + \cdots \\ &amp;= \infty. \end{align*}\]<p>Thus, despite the fact that the harmonic series $H_n$ diverges, its general term $1/n$ converges to $0$.</p><blockquote class="prompt-danger"><p>If $\lim_{n\to\infty} a_n \neq 0$, then the series $\sum a_n$ must diverge, but assuming that a series $\sum a_n$ converges just because $\lim_{n\to\infty} a_n = 0$ is dangerous. In such cases, other methods must be used to determine convergence or divergence.</p></blockquote><h2 id="geometric-series">Geometric Series</h2><p>The <strong>geometric series</strong> derived from a geometric sequence with first term 1 and common ratio $r$,</p>\[1 + r + r^2 + r^3 + \cdots \label{eqn:geometric_series}\tag{5}\]<p>is <u>the most important and fundamental series</u>. From the equation</p>\[(1-r)(1+r+\cdots + r^{n-1}) = 1 - r^n\]<p>we get</p>\[1 + r + \cdots + r^{n-1} = \frac{1-r^n}{1-r} = \frac{1}{1-r} - \frac{r^n}{1-r} \qquad (r \neq 1) \label{eqn:sum_of_geometric_series}\tag{6}\]<p>Meanwhile,</p>\[\lim_{n\to\infty} r^n = 0 \quad \Leftrightarrow \quad |r| &lt; 1\]<p>Therefore, we know that the necessary and sufficient condition for the geometric series ($\ref{eqn:geometric_series}$) to converge is $|r| &lt; 1$.</p><blockquote class="prompt-info"><p><strong>Convergence/divergence of geometric series</strong><br /> The geometric series $\sum ar^{n-1}$</p><ul><li>converges if $|r| &lt; 1$<li>diverges if $|r| \geq 1$</ul></blockquote><p>From this, we obtain</p>\[1 + r + r^2 + r^3 + \cdots = \frac{1}{1-r} \qquad (|r| &lt; 1) \label{eqn:sum_of_inf_geometric_series}\tag{7}\]<h3 id="geometric-series-and-approximations">Geometric Series and Approximations</h3><p>The identity ($\ref{eqn:sum_of_geometric_series}$) is useful for finding approximations of $\cfrac{1}{1-r}$ when $|r| &lt; 1$.</p><p>Substituting $r=-\epsilon$ and $n=2$ into this equation, we get</p>\[\frac{1}{1+\epsilon} - (1 - \epsilon) = \frac{\epsilon^2}{1 + \epsilon}\]<p>Therefore, if $0 &lt; \epsilon &lt; 1$, then</p>\[0 &lt; \frac{1}{1 + \epsilon} - (1 - \epsilon) &lt; \epsilon^2\]<p>which gives us</p>\[\frac{1}{1 + \epsilon} \approx (1 - \epsilon) \pm \epsilon^2 \qquad (0 &lt; \epsilon &lt; 1)\]<p>From this, we can see that for sufficiently small positive $\epsilon$, $\cfrac{1}{1 + \epsilon}$ can be approximated by $1 - \epsilon$.</p><h2 id="p-series-test">$p$-Series Test</h2><p>For a positive real number $p$, a series of the following form is called a <strong>$p$-series</strong>:</p>\[\sum_{n=1}^{\infty} \frac{1}{n^p}\]<blockquote class="prompt-info"><p><strong>Convergence/divergence of $p$-series</strong><br /> The $p$-series $\sum \cfrac{1}{n^p}$</p><ul><li>converges if $p&gt;1$<li>diverges if $p\leq 1$</ul></blockquote><p>When $p=1$ in a $p$-series, we get the harmonic series, which we’ve already shown diverges.<br /> The problem of finding the value of the $p$-series when $p=2$, i.e., $\sum \cfrac{1}{n^2}$, is known as the ‘Basel problem’, named after the hometown of the Bernoulli family, which produced several famous mathematicians over multiple generations and first proved that this series converges. The answer to this problem is known to be $\cfrac{\pi^2}{6}$.</p><p>More generally, the $p$-series where $p&gt;1$ is called the <strong>zeta function</strong>. This is a special function introduced by Leonhard Euler in 11740 <a href="https://en.wikipedia.org/wiki/Holocene_calendar">HE</a> and later named by Riemann, defined as</p>\[\zeta(s) := \sum_{n=1}^{\infty} \frac{1}{n^s} \qquad (s&gt;1)\]<p>This topic somewhat deviates from the main subject of this post, and frankly, as an engineering student rather than a mathematician, I don’t know much about it, so I won’t cover it here. However, Leonhard Euler showed that the zeta function can also be expressed as an infinite product of primes, known as the <strong>Euler Product</strong>, and subsequently, the zeta function has occupied a central position in various fields under analytic number theory. The <strong>Riemann zeta function</strong>, which extends the domain of the zeta function to complex numbers, and the important unsolved problem related to it, the <strong>Riemann hypothesis</strong>, are among these.</p><p>Returning to our original topic, the proof of the $p$-series test requires the <a href="#comparison-test">Comparison Test</a> and the <a href="#integral-test">Integral Test</a>, which will be discussed later. However, the convergence/divergence of $p$-series can be usefully applied in the <a href="#comparison-test">Comparison Test</a> along with geometric series, which is why I’ve intentionally placed it earlier in this post.</p><h3 id="proof-1">Proof</h3><h4 id="i-when-p1">i) When $p&gt;1$</h4><p>The integral</p>\[\int_1^\infty \frac{1}{x^p}\ dx = \left[\frac{1}{-p+1}\frac{1}{x^{p-1}} \right]^\infty_1 = \frac{1}{p-1}\]<p>converges, so by the <a href="#integral-test">Integral Test</a>, the series $\sum \cfrac{1}{n^p}$ also converges.</p><h4 id="ii-when-pleq-1">ii) When $p\leq 1$</h4><p>In this case,</p>\[0 \leq \frac{1}{n} \leq \frac{1}{n^p}\]<p>Since we know that the harmonic series $\sum \cfrac{1}{n}$ diverges, by the <a href="#comparison-test">Comparison Test</a>, $\sum \cfrac{1}{n^p}$ also diverges.</p><h4 id="conclusion">Conclusion</h4><p>By i) and ii), the $p$-series $\sum \cfrac{1}{n^p}$ converges if $p&gt;1$ and diverges if $p \leq 1$. $\blacksquare$</p><h2 id="comparison-test">Comparison Test</h2><p>Jakob Bernoulli’s <strong>Comparison Test</strong> is useful for determining the convergence/divergence of a <strong>series of positive terms</strong>, where each term is a non-negative real number.</p><p>Since a series of positive terms forms an increasing sequence, it must converge unless it diverges to infinity ($\sum a_n = \infty$). Therefore, in a series of positive terms, the expression</p>\[\sum a_n &lt; \infty\]<p>means that <u>the series converges</u>.</p><blockquote class="prompt-info"><p><strong>Comparison Test</strong><br /> If $0 \leq a_n \leq b_n$, then</p><ul><li>$\sum b_n &lt; \infty \ \Rightarrow \ \sum a_n &lt; \infty$<li>$\sum a_n = \infty \ \Rightarrow \ \sum b_n = \infty$</ul></blockquote><p>In particular, for series of positive terms that have forms similar to the geometric series $\sum ar^{n-1}$ or the $p$-series $\sum \cfrac{1}{n^p}$ that we’ve examined earlier, such as $\sum \cfrac{1}{n^2 + n}$, $\sum \cfrac{\log n}{n^3}$, $\sum \cfrac{1}{2^n + 3^n}$, $\sum \cfrac{1}{\sqrt{n}}$, $\sum \sin{\cfrac{1}{n}}$, it’s a good idea to actively try the Comparison Test.</p><p>All the other convergence/divergence tests that will be discussed later can be derived from this <strong>Comparison Test</strong>, making it arguably the most important test.</p><h3 id="limit-comparison-test">Limit Comparison Test</h3><p>For series of positive terms $\sum a_n$ and $\sum b_n$, if the dominant terms in the numerator and denominator of the ratio $a_n/b_n$ cancel out, resulting in $\lim_{n\to\infty} \cfrac{a_n}{b_n}=c \text{ (}c\text{ is a finite positive number)}$, and if we know whether the series $\sum b_n$ converges or diverges, then we can use the following <strong>Limit Comparison Test</strong>.</p><blockquote class="prompt-info"><p><strong>Limit Comparison Test</strong><br /> If</p>\[\lim_{n\to\infty} \frac{a_n}{b_n} = c \text{ (}c\text{ is a finite positive number)}\]<p>then both series $\sum a_n$ and $\sum b_n$ either both converge or both diverge. That is, $ \sum a_n &lt; \infty \ \Leftrightarrow \ \sum b_n &lt; \infty$.</p></blockquote><h2 id="root-test">Root Test</h2><blockquote class="prompt-info"><p><strong>Theorem</strong><br /> For a series of positive terms $\sum a_n$ and a positive number $\epsilon &lt; 1$</p><ul><li>If $\sqrt[n]{a_n}&lt; 1-\epsilon$ for all $n$, then the series $\sum a_n$ converges<li>If $\sqrt[n]{a_n}&gt; 1+\epsilon$ for all $n$, then the series $\sum a_n$ diverges</ul></blockquote><blockquote class="prompt-info"><p><strong>Corollary: Root Test</strong><br /> For a series of positive terms $\sum a_n$, if the limit</p>\[\lim_{n\to\infty} \sqrt[n]{a_n} =: r\]<p>exists, then</p><ul><li>the series $\sum a_n$ converges if $r&lt;1$<li>the series $\sum a_n$ diverges if $r&gt;1$</ul></blockquote><blockquote class="prompt-warning"><p>In the corollary above, if $r=1$, the test is inconclusive, and other methods must be used to determine convergence or divergence.</p></blockquote><h2 id="ratio-test">Ratio Test</h2><blockquote class="prompt-info"><p><strong>Ratio Test</strong><br /> For a sequence of positive terms $(a_n)$ and $0 &lt; r &lt; 1$</p><ul><li>If $a_{n+1}/a_n \leq r$ for all $n$, then the series $\sum a_n$ converges<li>If $a_{n+1}/a_n \geq 1$ for all $n$, then the series $\sum a_n$ diverges</ul></blockquote><blockquote class="prompt-info"><p><strong>Corollary</strong><br /> For a sequence of positive terms $(a_n)$, if the limit $\rho := \lim_{n\to\infty} \cfrac{a_{n+1}}{a_n}$ exists, then</p><ul><li>the series $\sum a_n$ converges if $\rho &lt; 1$<li>the series $\sum a_n$ diverges if $\rho &gt; 1$</ul></blockquote><h2 id="integral-test">Integral Test</h2><p>Integration can be used to determine the convergence/divergence of a series composed of a decreasing sequence of positive terms.</p><blockquote class="prompt-info"><p><strong>Integral Test</strong><br /> For a continuous, decreasing function $f: \left[1,\infty \right) \rightarrow \mathbb{R}$ with $f(x)&gt;0$ for all $x$, the series $\sum f(n)$ converges if and only if the integral</p>\[\int_1^\infty f(x)\ dx := \lim_{b\to\infty} \int_1^b f(x)\ dx\]<p>converges.</p></blockquote><h3 id="proof-2">Proof</h3><p>Since the function $f(x)$ is continuous, decreasing, and always positive, the inequality</p>\[f(n+1) \leq \int_n^{n+1} f(x)\ dx \leq f(n)\]<p>holds. Adding these inequalities from $n=1$ to the general term, we get</p>\[f(2) + \cdots + f(n+1) \leq \int_1^{n+1} f(x)\ dx \leq f(1) + \cdots + f(n)\]<p>Now, using the <a href="#comparison-test">Comparison Test</a>, we obtain the desired result. $\blacksquare$</p><h2 id="alternating-series">Alternating Series</h2><p>A series $\sum a_n$ where each term $a_n$ is non-zero and has a sign opposite to that of the next term $a_{n+1}$, i.e., where positive and negative terms alternate, is called an <strong>alternating series</strong>.</p><p>For alternating series, the following theorem discovered by the German mathematician Gottfried Wilhelm Leibniz can be usefully applied to determine convergence/divergence.</p><blockquote class="prompt-info"><p><strong>Alternating Series Test</strong><br /> If</p><ol><li>$a_n$ and $a_{n+1}$ have opposite signs for all $n$,<li>$|a_n| \geq |a_{n+1}|$ for all $n$, and<li>$\lim_{n\to\infty} a_n = 0$,</ol><p>then the alternating series $\sum a_n$ converges.</p></blockquote><h2 id="absolute-convergence">Absolute Convergence</h2><p>For a series $\sum a_n$, if the series $\sum |a_n|$ converges, we say that “the series $\sum a_n$ <strong>converges absolutely</strong>.”</p><p>The following theorem holds:</p><blockquote class="prompt-info"><p><strong>Theorem</strong><br /> A series that converges absolutely also converges.</p></blockquote><blockquote class="prompt-warning"><p>The converse of the above theorem is not true.<br /> If a series converges but does not converge absolutely, we say it “<strong>converges conditionally</strong>.”</p></blockquote><h3 id="proof-3">Proof</h3><p>For a real number $a$, define</p>\[\begin{align*} a^+ &amp;:= \max\{a,0\} = \frac{1}{2}(|a| + a), \\ a^- &amp;:= -\min\{a,0\} = \frac{1}{2}(|a| - a) \end{align*}\]<p>Then,</p>\[a = a^+ - a^-, \qquad |a| = a^+ + a^-\]<p>Since $0 \leq a^\pm \leq |a|$, by the <a href="#comparison-test">Comparison Test</a>, if the series $\sum |a_n|$ converges, then both series $\sum a_n^+$ and $\sum a_n^-$ also converge, and therefore, by the <a href="/posts/sequences-and-series/#basic-properties-of-convergent-series">basic properties of convergent series</a>,</p>\[\sum a_n = \sum (a_n^+ - a_n^-) = \sum a_n^+ - \sum a_n^-\]<p>also converges. $\blacksquare$</p>]]> </content> </entry> <entry><title xml:lang="en">Sequences and Series</title><link href="https://www.yunseo.kim/posts/sequences-and-series/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/sequences-and-series/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/sequences-and-series/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/sequences-and-series/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/sequences-and-series/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/sequences-and-series/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/sequences-and-series/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/sequences-and-series/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/sequences-and-series/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/sequences-and-series/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-03-16T00:00:00+09:00</published> <updated>2025-05-13T16:24:07+09:00</updated> <id>https://www.yunseo.kim/posts/sequences-and-series/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Calculus" /> <summary xml:lang="en">We examine fundamental concepts of calculus such as the definition of sequences and series, convergence and divergence of sequences, convergence and divergence of series, and the definition of e, the base of natural logarithm.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>We examine fundamental concepts of calculus such as the definition of sequences and series, convergence and divergence of sequences, convergence and divergence of series, and the definition of e, the base of natural logarithm.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="sequences">Sequences</h2><p>In calculus, a <strong>sequence</strong> primarily refers to an infinite sequence. That is, a sequence is a function defined on the set of all <strong>natural numbers</strong></p>\[\mathbb{N} := \{1,2,3,\dots\}\]<ul><li>If the values of this function are real numbers, we call it a ‘real sequence’; if complex numbers, a ‘complex sequence’; if points, a ‘point sequence’; if matrices, a ‘matrix sequence’; if functions, a ‘function sequence’; if sets, a ‘set sequence’. However, all of these can be simply referred to as ‘sequences’.</ul><p>Usually, for the <strong>field of real numbers</strong> $\mathbb{R}$, in a sequence $\mathbf{a}: \mathbb{N} \to \mathbb{R}$, we denote</p>\[a_1 := \mathbf{a}(1), \quad a_2 := \mathbf{a}(2), \quad a_3 := \mathbf{a}(3)\]<p>and represent this sequence as</p>\[a_1,\, a_2,\, a_3,\, \dots\]<p>or</p>\[\begin{gather*} (a_1,a_2,a_3,\dots), \\ (a_n: n=1,2,3,\dots), \\ (a_n)_{n=1}^{\infty}, \qquad (a_n) \end{gather*}\]<blockquote class="prompt-info"><p>*In the process of defining a sequence, instead of using the set of all natural numbers $\mathbb{N}$ as the domain, we can also use the set of non-negative integers</p>\[\mathbb{N}_0 := \{0\} \cup \mathbb{N} = \{0,1,2,\dots\}\]<p>or</p>\[\{2,3,4,\dots \}\]<p>For example, when dealing with power series theory, it’s more natural to have $\mathbb{N}_0$ as the domain.</p></blockquote><h2 id="convergence-and-divergence">Convergence and Divergence</h2><p>If a sequence $(a_n)$ converges to a real number $l$, we write</p>\[\lim_{n\to \infty} a_n = l\]<p>and call $l$ the <strong>limit</strong> of the sequence $(a_n)$.</p><blockquote class="prompt-info"><p>The rigorous definition using the <strong>epsilon-delta argument</strong> is as follows:</p>\[\lim_{n\to \infty} a_n = l \overset{def}\Longleftrightarrow \forall \epsilon &gt; 0,\, \exists N \in \mathbb{N}\ (n &gt; N \Rightarrow |a_n - l| &lt; \epsilon)\]<p>In other words, if for any positive $\epsilon$, there always exists a natural number $N$ such that $|a_n - l | &lt; \epsilon$ when $n&gt;N$, it means that the difference between $a_n$ and $l$ becomes infinitely small for sufficiently large $n$. Therefore, we define that a sequence $(a_n)$ satisfying this condition converges to the real number $l$.</p></blockquote><p>A sequence that does not converge is said to <strong>diverge</strong>. <em>The convergence or divergence of a sequence does not change even if a finite number of its terms are altered.</em></p><p>If each term of the sequence $(a_n)$ grows infinitely large, we write</p>\[\lim_{n\to \infty} a_n = \infty\]<p>and say that it <em>diverges to positive infinity</em>. Similarly, if each term of the sequence $(a_n)$ becomes infinitely small, we write</p>\[\lim_{n\to \infty} a_n = -\infty\]<p>and say that it <em>diverges to negative infinity</em>.</p><h2 id="basic-properties-of-convergent-sequences">Basic Properties of Convergent Sequences</h2><p>If sequences $(a_n)$ and $(b_n)$ both converge (i.e., have limits), then the sequences $(a_n + b_n)$ and $(a_n \cdot b_n)$ also converge, and</p>\[\lim_{n\to \infty} (a_n + b_n) = \lim_{n\to \infty} a_n + \lim_{n\to \infty} b_n \label{eqn:props_of_conv_series_1}\tag{1}\] \[\lim_{n\to \infty} (a_n \cdot b_n) = \left(\lim_{n\to \infty} a_n \right) \cdot \left(\lim_{n\to \infty} b_n \right) \label{eqn:props_of_conv_series_2}\tag{2}\]<p>Also, for any real number $t$,</p>\[\lim_{n\to \infty} (t a_n) = t\left(\lim_{n\to \infty} a_n \right) \label{eqn:props_of_conv_series_3}\tag{3}\]<p>These properties are called the <strong>basic properties of convergent sequences</strong> or <strong>basic properties of limits</strong>.</p><h2 id="e-the-base-of-natural-logarithm">$e$, the Base of Natural Logarithm</h2><p><strong>The base of natural logarithm</strong> is defined as</p>\[e := \lim_{n\to \infty} \left(1+\frac{1}{n} \right)^n \approx 2.718\]<p>This is considered one of the most important constants in mathematics.</p><blockquote class="prompt-tip"><p>The term ‘natural constant’ is widely used only in Korea, but this is not a standard term. The official term registered in the mathematics terminology dictionary by the Korean Mathematical Society is <a href="https://www.kms.or.kr/mathdict/list.html?key=kname&amp;keyword=%EC%9E%90%EC%97%B0%EB%A1%9C%EA%B7%B8%EC%9D%98+%EB%B0%91">‘base of natural logarithm’</a>, and the expression ‘natural constant’ cannot be found in this dictionary. Even in the Standard Korean Language Dictionary of the National Institute of Korean Language, the word ‘natural constant’ cannot be found, and in the <a href="https://stdict.korean.go.kr/search/searchView.do?pageSize=10&amp;searchKeyword=%EC%9E%90%EC%97%B0%EB%A1%9C%EA%B7%B8">dictionary definition of ‘natural logarithm’</a>, it only mentions “a specific number usually denoted as e”.<br /> In English-speaking countries and Japan, there is no corresponding term, and in English, it’s mainly referred to as ‘the base of the natural logarithm’ or shortened to ‘natural base’, or ‘Euler’s number’ or ‘the number $e$’.<br /> Since the origin is unclear and it has never been recognized as an official term by the Korean Mathematical Society, and it’s not used anywhere else in the world except Korea, there’s no reason to insist on using such a term. Therefore, from now on, I will refer to it as ‘the base of natural logarithm’ or simply denote it as $e$.</p></blockquote><h2 id="series">Series</h2><p>For a sequence</p>\[\mathbf{a} = (a_1, a_2, a_3, \dots)\]<p>the sequence of partial sums</p>\[a_1, \quad a_1 + a_2, \quad a_1 + a_2 + a_3, \quad \dots\]<p>is called the <strong>series</strong> of the sequence $\mathbf{a}$. The series of the sequence $(a_n)$ is denoted as</p>\[\begin{gather*} a_1 + a_2 + a_3 + \cdots, \qquad \sum_{n=1}^{\infty}a_n, \\ \sum_{n\geq 1} a_n, \qquad \sum_n a_n, \qquad \sum a_n \end{gather*}\]<h2 id="convergence-and-divergence-of-series">Convergence and Divergence of Series</h2><p>If the series obtained from the sequence $(a_n)$</p>\[a_1, \quad a_1 + a_2, \quad a_1 + a_2 + a_3, \quad \dots\]<p>converges to some real number $l$, we write</p>\[\sum_{n=1}^{\infty} a_n = l\]<p>The limit value $l$ is called the <strong>sum</strong> of the series $\sum a_n$. The symbol</p>\[\sum a_n\]<p>can represent either the <u>series</u> or the <u>sum of the series</u>, depending on the context.</p><p>A series that does not converge is said to <strong>diverge</strong>.</p><h2 id="basic-properties-of-convergent-series">Basic Properties of Convergent Series</h2><p>From the <a href="#basic-properties-of-convergent-sequences">basic properties of convergent sequences</a>, we obtain the following basic properties of convergent series. For a real number $t$ and two convergent series $\sum a_n$, $\sum b_n$,</p>\[\sum(a_n + b_n) = \sum a_n + \sum b_n, \qquad \sum ta_n = t\sum a_n \tag{4}\]<p>The convergence of a series is not affected by changes in a finite number of terms. That is, if $a_n=b_n$ for all but finitely many $n$ in two sequences $(a_n)$, $(b_n)$, the series $\sum a_n$ converges if and only if the series $\sum b_n$ converges.</p>]]> </content> </entry> <entry><title xml:lang="en">Newton's Laws of Motion</title><link href="https://www.yunseo.kim/posts/newtons-laws-of-motion/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/newtons-laws-of-motion/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/newtons-laws-of-motion/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/newtons-laws-of-motion/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/newtons-laws-of-motion/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/newtons-laws-of-motion/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/newtons-laws-of-motion/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/newtons-laws-of-motion/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/newtons-laws-of-motion/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/newtons-laws-of-motion/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-03-10T00:00:00+09:00</published> <updated>2026-02-16T05:09:10+09:00</updated> <id>https://www.yunseo.kim/posts/newtons-laws-of-motion/</id> <author> <name>Yunseo Kim</name> </author> <category term="Physics" /> <category term="Classical Dynamics" /> <summary xml:lang="en">We explore Newton&apos;s laws of motion, the meaning of these three laws, and the definitions of inertial mass and gravitational mass, as well as the principle of equivalence, which holds significant importance not only in classical mechanics but also in the later theory of general relativity.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>We explore Newton's laws of motion, the meaning of these three laws, and the definitions of inertial mass and gravitational mass, as well as the principle of equivalence, which holds significant importance not only in classical mechanics but also in the later theory of general relativity.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="tldr">TL;DR</h2><blockquote class="prompt-info"><p><strong>Newton’s Laws of Motion</strong></p><ol><li>A body remains at rest or in uniform linear motion unless acted upon by an external force.<li>The rate of change of momentum of a body is equal to the force applied to it.<ul><li>$\vec{F} = \cfrac{d\vec{p}}{dt} = \cfrac{d}{dt}(m\vec{v}) = m\vec{a}$</ul><li>When two bodies exert forces on each other, these forces are equal in magnitude and opposite in direction.<ul><li>$\vec{F_1} = -\vec{F_2}$</ul></ol></blockquote><blockquote class="prompt-info"><p><strong>Principle of Equivalence</strong></p><ul><li>Inertial mass: The mass that determines a body’s acceleration when a given force is applied<li>Gravitational mass: The mass that determines the gravitational force between a body and other bodies<li>Currently, inertial mass and gravitational mass are known to clearly agree within an error range of about $10^{-12}$<li>The assertion that inertial mass and gravitational mass are exactly equal is called the <strong>principle of equivalence</strong></ul></blockquote><h2 id="newtons-laws-of-motion">Newton’s Laws of Motion</h2><p>Newton’s laws of motion are three laws published by Isaac Newton in his work Philosophiæ Naturalis Principia Mathematica (Mathematical Principles of Natural Philosophy, abbreviated as ‘Principia’) in the year 11687 of the <a href="https://en.wikipedia.org/wiki/Holocene_calendar">Holocene calendar</a>. These laws form the foundation of Newtonian mechanics.</p><ol><li>A body remains at rest or in uniform linear motion unless acted upon by an external force.<li>The rate of change of momentum of a body is equal to the force applied to it.<li>When two bodies exert forces on each other, these forces are equal in magnitude and opposite in direction.</ol><h3 id="newtons-first-law">Newton’s First Law</h3><blockquote><p>I. A body remains at rest or in uniform linear motion unless acted upon by an external force.</p></blockquote><p>A body in such a state, with no external forces acting upon it, is called a <strong>free body</strong> or a <strong>free particle</strong>. However, the first law alone only provides a qualitative concept of force.</p><h3 id="newtons-second-law">Newton’s Second Law</h3><blockquote><p>II. The rate of change of momentum of a body is equal to the force applied to it.</p></blockquote><p>Newton defined <strong>momentum</strong> as the product of mass and velocity:</p>\[\vec{p} \equiv m\vec{v} \label{eqn:momentum}\tag{1}\]<p>From this, Newton’s second law can be expressed as:</p>\[\vec{F} = \frac{d\vec{p}}{dt} = \frac{d}{dt}(m\vec{v}) = m\vec{a}. \label{eqn:2nd_law}\tag{2}\]<p>Despite their names, Newton’s first and second laws are actually closer to ‘definitions’ of force rather than ‘laws’. Also, we can see that the definition of force depends on the definition of ‘mass’.</p><h3 id="newtons-third-law">Newton’s Third Law</h3><blockquote><p>III. When two bodies exert forces on each other, these forces are equal in magnitude and opposite in direction.</p></blockquote><p>This is also known as the ‘law of action and reaction’ and applies when the force exerted by one body on another is directed along the line connecting the two points of action. Such forces are called <strong>central forces</strong>, and the third law holds regardless of whether the central force is attractive or repulsive. Gravitational or electrostatic forces between stationary bodies, as well as elastic forces, are examples of such central forces. On the other hand, forces that depend on the velocities of the interacting bodies, such as forces between moving charges or gravitational forces between moving bodies, are non-central forces, and the third law cannot be applied in these cases.</p><p>Incorporating the definition of mass we examined earlier, the third law can be restated as:</p><blockquote><p>III$^\prime$. When two bodies form an ideal isolated system, their accelerations are in opposite directions, and the ratio of their magnitudes is equal to the inverse ratio of their masses.</p></blockquote><p>By Newton’s third law:</p>\[\vec{F_1} = -\vec{F_2} \label{eqn:3rd_law}\tag{3}\]<p>Substituting the second law ($\ref{eqn:2nd_law}$) into this:</p>\[\frac{d\vec{p_1}}{dt} = -\frac{d\vec{p_2}}{dt} \label{eqn:3rd-1_law}\tag{4}\]<p>From this, we can see that momentum is conserved in isolated interactions between two particles:</p>\[\frac{d}{dt}(\vec{p_1}+\vec{p_2}) = 0 \label{eqn:conservation_of_momentum}\tag{5}\]<p>Also, from equation ($\ref{eqn:3rd-1_law}$), since $\vec{p}=m\vec{v}$ and mass $m$ is constant:</p>\[m_1\left(\frac{d\vec{v_1}}{dt} \right) = m_2\left(-\frac{d\vec{v_2}}{dt} \right) \tag{6a}\] \[m_1(\vec{a_1}) = m_2(-\vec{a_2}) \tag{6b}\]<p>This gives us:</p>\[\frac{m_2}{m_1} = -\frac{a_1}{a_2}. \tag{7}\]<p>Although Newton’s third law describes the case where two bodies form an isolated system, it is actually impossible to realize such ideal conditions in reality, so Newton’s assertion in the third law could be considered somewhat audacious. Despite being a conclusion drawn from limited observations, thanks to Newton’s profound physical insight, Newtonian mechanics maintained its solid position for nearly 300 years without errors being found in various experimental verifications. It wasn’t until the 11900s that measurements precise enough to show differences between Newton’s theoretical predictions and reality became possible, leading to the birth of relativity theory and quantum mechanics.</p><h2 id="inertial-mass-and-gravitational-mass">Inertial Mass and Gravitational Mass</h2><p>One method of determining the mass of an object is to compare its weight with a standard weight using a tool like a balance. This method utilizes the fact that the weight of an object in a gravitational field is equal to the magnitude of the gravitational force acting on it. In this case, the second law $\vec{F}=m\vec{a}$ takes the form $\vec{W}=m\vec{g}$. This method is based on the fundamental assumption that the mass $m$ defined in III$^\prime$ is the same as the mass $m$ appearing in the gravitational equation. These two masses are called <strong>inertial mass</strong> and <strong>gravitational mass</strong>, respectively, and are defined as follows:</p><ul><li>Inertial mass: The mass that determines a body’s acceleration when a given force is applied<li>Gravitational mass: The mass that determines the gravitational force between a body and other bodies</ul><p>Although it is a story fabricated by later generations and unrelated to Galileo Galilei, the Leaning Tower of Pisa experiment was the first thought experiment to show that inertial mass and gravitational mass would be the same. Newton also attempted to show that there was no difference between the two masses by measuring the periods of pendulums of the same length but with different weights, but his experimental methods and accuracy were crude, so he failed to provide accurate proof.</p><p>Later, in the late 11800s, Hungarian physicist Eötvös Loránd Ágoston performed the Eötvös experiment to accurately measure the difference between inertial mass and gravitational mass, proving their identity with considerable accuracy (within an error of 1 in 20 million).</p><p>More recent experiments conducted by Robert Henry Dicke and others have further increased the accuracy, and currently, inertial mass and gravitational mass are known to be clearly identical within an error range of about $10^{-12}$. This result has extremely important implications in the general theory of relativity, and the assertion that inertial mass and gravitational mass are exactly equal is called the <strong>principle of equivalence</strong>.</p>]]> </content> </entry> <entry><title xml:lang="en">Homogeneous Linear ODEs of Second Order with Constant Coefficients</title><link href="https://www.yunseo.kim/posts/homogeneous-linear-odes-with-constant-coefficients/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/homogeneous-linear-odes-with-constant-coefficients/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/homogeneous-linear-odes-with-constant-coefficients/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/homogeneous-linear-odes-with-constant-coefficients/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/homogeneous-linear-odes-with-constant-coefficients/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/homogeneous-linear-odes-with-constant-coefficients/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/homogeneous-linear-odes-with-constant-coefficients/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/homogeneous-linear-odes-with-constant-coefficients/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/homogeneous-linear-odes-with-constant-coefficients/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/homogeneous-linear-odes-with-constant-coefficients/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-02-22T00:00:00+09:00</published> <updated>2025-07-11T21:22:11+09:00</updated> <id>https://www.yunseo.kim/posts/homogeneous-linear-odes-with-constant-coefficients/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Differential Equation" /> <summary xml:lang="en">Explore how the general solution of a second-order homogeneous linear ODE with constant coefficients changes based on the roots of its characteristic equation.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Explore how the general solution of a second-order homogeneous linear ODE with constant coefficients changes based on the roots of its characteristic equation.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="tldr">TL;DR</h2><blockquote class="prompt-info"><ul><li>Second-order homogeneous linear ODE with constant coefficients: $y^{\prime\prime} + ay^{\prime} + by = 0$<li><strong>Characteristic equation</strong>: $\lambda^2 + a\lambda + b = 0$<li>Depending on the sign of the discriminant $a^2 - 4b$ of the characteristic equation, the form of the general solution can be divided into three cases as shown in the table:</ul><table><thead><tr><th style="text-align: center">Case<th style="text-align: center">Roots of Characteristic Equation<th style="text-align: center">Basis of the ODE’s Solution<th style="text-align: center">General Solution of the ODE<tbody><tr><td style="text-align: center">I<td style="text-align: center">Distinct real roots<br />$\lambda_1$, $\lambda_2$<td style="text-align: center">$e^{\lambda_1 x}$, $e^{\lambda_2 x}$<td style="text-align: center">$y = c_1e^{\lambda_1 x} + c_2e^{\lambda_2 x}$<tr><td style="text-align: center">II<td style="text-align: center">Real double root<br /> $\lambda = -\cfrac{1}{2}a$<td style="text-align: center">$e^{-ax/2}$, $xe^{-ax/2}$<td style="text-align: center">$y = (c_1 + c_2 x)e^{-ax/2}$<tr><td style="text-align: center">III<td style="text-align: center">Complex conjugate roots<br /> $\lambda_1 = -\cfrac{1}{2}a + i\omega$, <br /> $\lambda_2 = -\cfrac{1}{2}a - i\omega$<td style="text-align: center">$e^{-ax/2}\cos{\omega x}$, <br /> $e^{-ax/2}\sin{\omega x}$<td style="text-align: center">$y = e^{-ax/2}(A\cos{\omega x} + B\sin{\omega x})$</table></blockquote><h2 id="prerequisites">Prerequisites</h2><ul><li><a href="/posts/Bernoulli-Equation/">Bernoulli Equation</a><li><a href="/posts/homogeneous-linear-odes-of-second-order/">Homogeneous Linear ODEs of Second Order</a><li>Euler’s formula</ul><h2 id="characteristic-equation">Characteristic Equation</h2><p>Let’s consider a second-order homogeneous linear ordinary differential equation with constant coefficients $a$ and $b$:</p>\[y^{\prime\prime} + ay^{\prime} + by = 0 \label{eqn:ode_with_constant_coefficients}\tag{1}\]<p>This type of equation has important applications in mechanical and electrical vibrations.</p><p>We have previously found the general solution of the logistic equation in <a href="/posts/Bernoulli-Equation/">Bernoulli Equation</a>, and according to it, the solution to the first-order linear ODE with a constant coefficient $k$,</p>\[y^\prime + ky = 0\]<p>is the exponential function $y = ce^{-kx}$ (the case where $A=-k$ and $B=0$ in equation (4) of that post).</p><p>Therefore, for a similarly shaped equation like ($\ref{eqn:ode_with_constant_coefficients}$), we can first try a solution of the form</p>\[y=e^{\lambda x}\label{eqn:general_sol}\tag{2}\]<blockquote class="prompt-info"><p>Of course, this is merely a guess, and there is no guarantee that the general solution will actually have this form. However, if we can find any two linearly independent solutions, we can obtain the general solution by the <a href="/posts/homogeneous-linear-odes-of-second-order/#superposition-principle">superposition principle</a>, as we saw in <a href="/posts/homogeneous-linear-odes-of-second-order/#basis-and-general-solution">Homogeneous Linear ODEs of Second Order</a>.<br /> As we will see shortly, <a href="#ii-real-double-root-lambda---cfraca2">there are also cases where we need to find a different form of solution</a>.</p></blockquote><p>Substituting Eq. ($\ref{eqn:general_sol}$) and its derivatives</p>\[y^\prime = \lambda e^{\lambda x}, \quad y^{\prime\prime} = \lambda^2 e^{\lambda x}\]<p>into Eq. ($\ref{eqn:ode_with_constant_coefficients}$) gives</p>\[(\lambda^2 + a\lambda + b)e^{\lambda x} = 0\]<p>Therefore, if $\lambda$ is a root of the <strong>characteristic equation</strong></p>\[\lambda^2 + a\lambda + b = 0 \label{eqn:characteristic_eqn}\tag{3}\]<p>then the exponential function ($\ref{eqn:general_sol}$) is a solution to the ordinary differential equation ($\ref{eqn:ode_with_constant_coefficients}$). Solving the quadratic equation ($\ref{eqn:characteristic_eqn}$) gives</p>\[\begin{align*} \lambda_1 &amp;= \frac{1}{2}\left(-a + \sqrt{a^2 - 4b}\right), \\ \lambda_2 &amp;= \frac{1}{2}\left(-a - \sqrt{a^2 - 4b}\right) \end{align*}\label{eqn:lambdas}\tag{4}\]<p>and from this, the two functions</p>\[y_1 = e^{\lambda_1 x}, \quad y_2 = e^{\lambda_2 x} \tag{5}\]<p>become solutions to equation ($\ref{eqn:ode_with_constant_coefficients}$).</p><blockquote class="prompt-tip"><p>The terms <strong>characteristic equation</strong> and <strong>auxiliary equation</strong> are often used interchangeably; they mean exactly the same thing. You can use either term.</p></blockquote><p>Now, we can divide the problem into three cases depending on the sign of the discriminant $a^2 - 4b$ of the characteristic equation ($\ref{eqn:characteristic_eqn}$).</p><ul><li>$a^2 - 4b &gt; 0$: Distinct real roots<li>$a^2 - 4b = 0$: Real double root<li>$a^2 - 4b &lt; 0$: Complex conjugate roots</ul><h2 id="form-of-the-general-solution-based-on-the-sign-of-the-discriminant">Form of the General Solution based on the Sign of the Discriminant</h2><h3 id="i-distinct-real-roots-lambda_1-and-lambda_2">I. Distinct Real Roots $\lambda_1$ and $\lambda_2$</h3><p>In this case, a basis of solutions for equation ($\ref{eqn:ode_with_constant_coefficients}$) on any interval is</p>\[y_1 = e^{\lambda_1 x}, \quad y_2 = e^{\lambda_2 x}\]<p>and the corresponding general solution is</p>\[y = c_1 e^{\lambda_1 x} + c_2 e^{\lambda_2 x} \label{eqn:general_sol_1}\tag{6}\]<h3 id="ii-real-double-root-lambda---cfraca2">II. Real Double Root $\lambda = -\cfrac{a}{2}$</h3><p>If $a^2 - 4b = 0$, the quadratic equation ($\ref{eqn:characteristic_eqn}$) yields only one root $\lambda = \lambda_1 = \lambda_2 = -\cfrac{a}{2}$. Therefore, the only solution of the form $y = e^{\lambda x}$ we can obtain is</p>\[y_1 = e^{-(a/2)x}\]<p>To obtain a basis, we need to find a second solution $y_2$ that is linearly independent of $y_1$.</p><p>In this situation, we can use <a href="/posts/homogeneous-linear-odes-of-second-order/#reduction-of-order">reduction of order</a>, which we have discussed before. We set the second solution we are looking for as $y_2=uy_1$, and substitute</p>\[\begin{align*} y_2 &amp;= uy_1, \\ y_2^{\prime} &amp;= u^{\prime}y_1 + uy_1^{\prime}, \\ y_2^{\prime\prime} &amp;= u^{\prime\prime}y_1 + 2u^{\prime}y_1^{\prime} + uy_1^{\prime\prime} \end{align*}\]<p>into equation ($\ref{eqn:ode_with_constant_coefficients}$) to get</p>\[(u^{\prime\prime}y_1 + 2u^\prime y_1^\prime + uy_1^{\prime\prime}) + a(u^\prime y_1 + uy_1^\prime) + buy_1 = 0\]<p>Grouping the terms by $u^{\prime\prime}$, $u^\prime$, and $u$ gives</p>\[y_1u^{\prime\prime} + (2y_1^\prime + ay_1)u^\prime + (y_1^{\prime\prime} + ay_1^\prime + by_1)u = 0\]<p>Here, since $y_1$ is a solution to equation ($\ref{eqn:ode_with_constant_coefficients}$), the expression in the last parenthesis is $0$. Also, since</p>\[2y_1^\prime = -ae^{-ax/2} = -ay_1\]<p>the expression in the first parenthesis is also $0$. Thus, only $u^{\prime\prime}y_1 = 0$ remains, which implies $u^{\prime\prime}=0$. Integrating twice gives $u = c_1x + c_2$. Since the integration constants $c_1$ and $c_2$ can be any value, we can simply choose $c_1=1$ and $c_2=0$ to set $u=x$. Then we have $y_2 = uy_1 = xy_1$. Since $y_1$ and $y_2$ are linearly independent, they form a basis. Therefore, when the characteristic equation ($\ref{eqn:characteristic_eqn}$) has a double root, a basis of solutions for equation ($\ref{eqn:ode_with_constant_coefficients}$) on any interval is</p>\[e^{-ax/2}, \quad xe^{-ax/2}\]<p>and the corresponding general solution is</p>\[y = (c_1 + c_2x)e^{-ax/2} \label{eqn:general_sol_2}\tag{7}\]<h3 id="iii-complex-conjugate-roots--cfrac12a--iomega-and--cfrac12a---iomega">III. Complex Conjugate Roots $-\cfrac{1}{2}a + i\omega$ and $-\cfrac{1}{2}a - i\omega$</h3><p>In this case, $a^2 - 4b &lt; 0$, and since $\sqrt{-1} = i$, from Eq. ($\ref{eqn:lambdas}$) we have</p>\[\cfrac{1}{2}\sqrt{a^2 - 4b} = \cfrac{1}{2}\sqrt{-(4b - a^2)} = \sqrt{-(b-\frac{1}{4}a^2)} = i\sqrt{b - \frac{1}{4}a^2}\]<p>Here, let’s define the real number $\omega = \sqrt{b-\cfrac{1}{4}a^2}$.</p><p>With $\omega$ defined as above, the roots of the characteristic equation ($\ref{eqn:characteristic_eqn}$) are the complex conjugate roots $\lambda = -\cfrac{1}{2}a \pm i\omega$. The corresponding two complex solutions to equation ($\ref{eqn:ode_with_constant_coefficients}$) are</p>\[\begin{align*} e^{\lambda_1 x} &amp;= e^{-(a/2)x + i\omega x}, \\ e^{\lambda_2 x} &amp;= e^{-(a/2)x - i\omega x} \end{align*}\]<p>However, in this case, we can obtain a basis of real solutions as follows.</p><p>From Euler’s formula</p>\[e^{it} = \cos t + i\sin t \label{eqn:euler_formula}\tag{8}\]<p>and by substituting $-t$ for $t$ in the above equation to get</p>\[e^{-it} = \cos t - i\sin t\]<p>we can add and subtract these two equations to obtain:</p>\[\begin{align*} \cos t &amp;= \frac{1}{2}(e^{it} + e^{-it}), \\ \sin t &amp;= \frac{1}{2i}(e^{it} - e^{-it}). \end{align*} \label{eqn:cos_and_sin}\tag{9}\]<p>The complex exponential function $e^z$ of a complex variable $z = r + it$ with real part $r$ and imaginary part $it$ can be defined using the real functions $e^r$, $\cos t$, and $\sin t$ as follows.</p>\[e^z = e^{r + it} = e^r e^{it} = e^r(\cos t + i\sin t) \label{eqn:complex_exp}\tag{10}\]<p>Here, setting $r=-\cfrac{1}{2}ax$ and $t=\omega x$, we can write:</p>\[\begin{align*} e^{\lambda_1 x} &amp;= e^{-(a/2)x + i\omega x} = e^{-(a/2)x}(\cos{\omega x} + i\sin{\omega x}) \\ e^{\lambda_2 x} &amp;= e^{-(a/2)x - i\omega x} = e^{-(a/2)x}(\cos{\omega x} - i\sin{\omega x}) \end{align*}\]<p>By the <a href="/posts/homogeneous-linear-odes-of-second-order/#superposition-principle">superposition principle</a>, the sum and constant multiples of these complex solutions are also solutions. Therefore, by adding the two equations side by side and multiplying both sides by $\cfrac{1}{2}$, we can obtain the first real solution $y_1$ as follows.</p>\[y_1 = e^{-(a/2)x} \cos{\omega x}. \label{eqn:basis_1}\tag{11}\]<p>Similarly, by subtracting the second equation from the first and multiplying both sides by $\cfrac{1}{2i}$, we can obtain the second real solution $y_2$.</p>\[y_2 = e^{-(a/2)x} \sin{\omega x}. \label{eqn:basis_2}\tag{12}\]<p>Since $\cfrac{y_1}{y_2} = \cot{\omega x}$ is not a constant, $y_1$ and $y_2$ are linearly independent on any interval and thus form a basis of real solutions for equation ($\ref{eqn:ode_with_constant_coefficients}$). From this, we obtain the general solution</p>\[y = e^{-ax/2}(A\cos{\omega x} + B\sin{\omega x}) \quad \text{(where }A,\, B\text{ are arbitrary constants)} \label{eqn:general_sol_3}\tag{13}\] ]]> </content> </entry> <entry><title xml:lang="en">How to Support Multiple Languages on a Jekyll Blog with Polyglot (3) - Troubleshooting Chirpy Theme Build Failures and Search Function Errors</title><link href="https://www.yunseo.kim/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-3/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-3/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-3/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-3/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-3/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-3/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-3/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-3/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-3/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-3/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-02-05T00:00:00+09:00</published> <updated>2025-10-08T00:53:56+09:00</updated> <id>https://www.yunseo.kim/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-3/</id> <author> <name>Yunseo Kim</name> </author> <category term="Dev" /> <category term="Web Dev" /> <summary xml:lang="en">This post introduces the process of implementing multi-language support on a Jekyll blog based on the &apos;jekyll-theme-chirpy&apos; by applying the Polyglot plugin. This is the third post in the series, covering the identification and resolution of errors that occur when applying Polyglot to the Chirpy theme.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>This post introduces the process of implementing multi-language support on a Jekyll blog based on the 'jekyll-theme-chirpy' by applying the Polyglot plugin. This is the third post in the series, covering the identification and resolution of errors that occur when applying Polyglot to the Chirpy theme.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="overview">Overview</h2><p>In early July 12024, I added multi-language support to this blog, which is hosted on GitHub Pages with Jekyll, by applying the <a href="https://github.com/untra/polyglot">Polyglot</a> plugin. This series shares the bugs encountered while applying the Polyglot plugin to the Chirpy theme, their solutions, and how to write the HTML header and sitemap.xml with SEO in mind. The series consists of 3 posts, and this is the third post in the series.</p><ul><li>Part 1: <a href="/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-1">Applying Polyglot Plugin &amp; Modifying HTML Header and Sitemap</a><li>Part 2: <a href="/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-2">Implementing Language Selection Button &amp; Localizing Layout Language</a><li>Part 3: Troubleshooting Chirpy Theme Build Failures and Search Function Errors (this post)</ul><blockquote class="prompt-info"><p>This series was originally planned as two parts, but as I added more content over several revisions, the length increased significantly, so it has been reorganized into three parts.</p></blockquote><h2 id="requirements">Requirements</h2><ul class="task-list"><li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />The built result (web pages) must be served under language-specific paths (e.g., <code class="language-plaintext filepath highlighter-rouge">/posts/ko/</code>, <code class="language-plaintext filepath highlighter-rouge">/posts/ja/</code>).<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />To minimize the additional time and effort for multi-language support, the build process should automatically recognize the language based on the local file path (e.g., <code class="language-plaintext filepath highlighter-rouge">/_posts/ko/</code>, <code class="language-plaintext filepath highlighter-rouge">/_posts/ja/</code>) without needing to manually specify ‘lang’ and ‘permalink’ tags in the YAML front matter of each Markdown file.<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />The header of each page on the site must meet Google’s SEO guidelines for multilingual search by including appropriate Content-Language meta tags, hreflang alternate tags, and canonical links.<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />The site must provide all language-specific page links in a single <code class="language-plaintext filepath highlighter-rouge">sitemap.xml</code> file without omissions, and this <code class="language-plaintext filepath highlighter-rouge">sitemap.xml</code> file must exist only at the root path without duplication.<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />All features provided by the <a href="https://github.com/cotes2020/jekyll-theme-chirpy">Chirpy theme</a> must function correctly on each language page. If not, they must be modified to work properly.<ul class="task-list"><li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />‘Recently Updated’ and ‘Trending Tags’ features work correctly.<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />No errors during the build process using GitHub Actions.<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />The post search function in the top-right corner of the blog works correctly.</ul></ul><h2 id="before-we-begin">Before We Begin</h2><p>This post is a continuation of <a href="/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-1">Part 1</a> and <a href="/posts/how-to-support-multi-language-on-jekyll-blog-with-polyglot-2">Part 2</a>, so if you haven’t read them yet, I recommend reading the previous posts first.</p><h2 id="troubleshooting-relative_url_regex-target-of-repeat-operator-is-not-specified">Troubleshooting (‘relative_url_regex’: target of repeat operator is not specified)</h2><p>(+ 12025.10.08. Update) <a href="https://polyglot.untra.io/2025/09/20/polyglot.1.11.0/">This bug was fixed in Polyglot version 1.11</a>.</p><p>After completing the previous steps, when I ran the <code class="language-plaintext highlighter-rouge">bundle exec jekyll serve</code> command to test the build, it failed with the error <code class="language-plaintext highlighter-rouge">'relative_url_regex': target of repeat operator is not specified</code>.</p><div class="language-shell highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre><td class="rouge-code"><pre>...<span class="o">(</span>omitted<span class="o">)</span>
                    <span class="nt">------------------------------------------------</span>
      Jekyll 4.3.4   Please append <span class="sb">`</span><span class="nt">--trace</span><span class="sb">`</span> to the <span class="sb">`</span>serve<span class="sb">`</span> <span class="nb">command 
                     </span><span class="k">for </span>any additional information or backtrace. 
                    <span class="nt">------------------------------------------------</span>
/Users/yunseo/.gem/ruby/3.2.2/gems/jekyll-polyglot-1.8.1/lib/jekyll/polyglot/
patches/jekyll/site.rb:234:in <span class="sb">`</span>relative_url_regex<span class="s1">': target of repeat operator 
is not specified: /href="?\/((?:(?!*.gem)(?!*.gemspec)(?!tools)(?!README.md)(
?!LICENSE)(?!*.config.js)(?!rollup.config.js)(?!package*.json)(?!.sass-cache)
(?!.jekyll-cache)(?!gemfiles)(?!Gemfile)(?!Gemfile.lock)(?!node_modules)(?!ve
ndor\/bundle\/)(?!vendor\/cache\/)(?!vendor\/gems\/)(?!vendor\/ruby\/)(?!en\/
)(?!ko\/)(?!es\/)(?!pt-BR\/)(?!ja\/)(?!fr\/)(?!de\/)[^,'</span><span class="s2">"</span><span class="se">\s\/</span><span class="s2">?.]+</span><span class="se">\.</span><span class="s2">?)*(?:</span><span class="se">\/</span><span class="s2">[^
</span><span class="se">\]\[</span><span class="s2">)("</span><span class="s1">'\s]*)?)"/ (RegexpError)

...(omitted)
</span></pre></div></div><p>After searching to see if similar issues had been reported, I found that <a href="https://github.com/untra/polyglot/issues/204">exactly the same issue</a> had already been registered in the Polyglot repository, and a solution existed.</p><p>The <a href="https://github.com/cotes2020/jekyll-theme-chirpy/blob/master/_config.yml">Chirpy theme’s <code class="language-plaintext filepath highlighter-rouge">_config.yml</code></a> file contains the following syntax:</p><div file="_config.yml" class="language-yml highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre><td class="rouge-code"><pre><span class="na">exclude</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s2">"</span><span class="s">*.gem"</span>
  <span class="pi">-</span> <span class="s2">"</span><span class="s">*.gemspec"</span>
  <span class="pi">-</span> <span class="s">docs</span>
  <span class="pi">-</span> <span class="s">tools</span>
  <span class="pi">-</span> <span class="s">README.md</span>
  <span class="pi">-</span> <span class="s">LICENSE</span>
  <span class="pi">-</span> <span class="s2">"</span><span class="s">*.config.js"</span>
  <span class="pi">-</span> <span class="s">package*.json</span>
</pre></div></div><p>The cause of the problem lies in the regex syntax in the following two functions in <a href="https://github.com/untra/polyglot/blob/master/lib/jekyll/polyglot/patches/jekyll/site.rb">Polyglot’s <code class="language-plaintext filepath highlighter-rouge">site.rb</code></a>, which cannot properly handle globbing patterns with wildcards like <code class="language-plaintext highlighter-rouge">"*.gem"</code>, <code class="language-plaintext highlighter-rouge">"*.gemspec"</code>, and <code class="language-plaintext highlighter-rouge">"*.config.js"</code>.</p><div file="(polyglot root path)/lib/jekyll/polyglot/patches/jekyll/site.rb" class="language-ruby highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
</pre><td class="rouge-code"><pre>    <span class="c1"># a regex that matches relative urls in a html document</span>
    <span class="c1"># matches href="baseurl/foo/bar-baz" href="/foo/bar-baz" and others like it</span>
    <span class="c1"># avoids matching excluded files.  prepare makes sure</span>
    <span class="c1"># that all @exclude dirs have a trailing slash.</span>
    <span class="k">def</span> <span class="nf">relative_url_regex</span><span class="p">(</span><span class="n">disabled</span> <span class="o">=</span> <span class="kp">false</span><span class="p">)</span>
      <span class="n">regex</span> <span class="o">=</span> <span class="s1">''</span>
      <span class="k">unless</span> <span class="n">disabled</span>
        <span class="vi">@exclude</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span>
          <span class="n">regex</span> <span class="o">+=</span> <span class="s2">"(?!</span><span class="si">#{</span><span class="n">x</span><span class="si">}</span><span class="s2">)"</span>
        <span class="k">end</span>
        <span class="vi">@languages</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span>
          <span class="n">regex</span> <span class="o">+=</span> <span class="s2">"(?!</span><span class="si">#{</span><span class="n">x</span><span class="si">}</span><span class="se">\/</span><span class="s2">)"</span>
        <span class="k">end</span>
      <span class="k">end</span>
      <span class="n">start</span> <span class="o">=</span> <span class="n">disabled</span> <span class="p">?</span> <span class="s1">'ferh'</span> <span class="p">:</span> <span class="s1">'href'</span>
      <span class="sr">%r{</span><span class="si">#{</span><span class="n">start</span><span class="si">}</span><span class="sr">="?</span><span class="si">#{</span><span class="vi">@baseurl</span><span class="si">}</span><span class="sr">/((?:</span><span class="si">#{</span><span class="n">regex</span><span class="si">}</span><span class="sr">[^,'"</span><span class="se">\s</span><span class="sr">/?.]+</span><span class="se">\.</span><span class="sr">?)*(?:/[^</span><span class="se">\]\[</span><span class="sr">)("'</span><span class="se">\s</span><span class="sr">]*)?)"}</span>
    <span class="k">end</span>

    <span class="c1"># a regex that matches absolute urls in a html document</span>
    <span class="c1"># matches href="http://baseurl/foo/bar-baz" and others like it</span>
    <span class="c1"># avoids matching excluded files.  prepare makes sure</span>
    <span class="c1"># that all @exclude dirs have a trailing slash.</span>
    <span class="k">def</span> <span class="nf">absolute_url_regex</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">disabled</span> <span class="o">=</span> <span class="kp">false</span><span class="p">)</span>
      <span class="n">regex</span> <span class="o">=</span> <span class="s1">''</span>
      <span class="k">unless</span> <span class="n">disabled</span>
        <span class="vi">@exclude</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span>
          <span class="n">regex</span> <span class="o">+=</span> <span class="s2">"(?!</span><span class="si">#{</span><span class="n">x</span><span class="si">}</span><span class="s2">)"</span>
        <span class="k">end</span>
        <span class="vi">@languages</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span>
          <span class="n">regex</span> <span class="o">+=</span> <span class="s2">"(?!</span><span class="si">#{</span><span class="n">x</span><span class="si">}</span><span class="se">\/</span><span class="s2">)"</span>
        <span class="k">end</span>
      <span class="k">end</span>
      <span class="n">start</span> <span class="o">=</span> <span class="n">disabled</span> <span class="p">?</span> <span class="s1">'ferh'</span> <span class="p">:</span> <span class="s1">'href'</span>
      <span class="sr">%r{(?&lt;!hreflang="</span><span class="si">#{</span><span class="vi">@default_lang</span><span class="si">}</span><span class="sr">" )</span><span class="si">#{</span><span class="n">start</span><span class="si">}</span><span class="sr">="?</span><span class="si">#{</span><span class="n">url</span><span class="si">}#{</span><span class="vi">@baseurl</span><span class="si">}</span><span class="sr">/((?:</span><span class="si">#{</span><span class="n">regex</span><span class="si">}</span><span class="sr">[^,'"</span><span class="se">\s</span><span class="sr">/?.]+</span><span class="se">\.</span><span class="sr">?)*(?:/[^</span><span class="se">\]\[</span><span class="sr">)("'</span><span class="se">\s</span><span class="sr">]*)?)"}</span>
    <span class="k">end</span>
</pre></div></div><p>There are two ways to solve this problem.</p><h3 id="1-fork-polyglot-and-modify-the-problematic-parts">1. Fork Polyglot and modify the problematic parts</h3><p>As of the time of writing this post (11.12024), the <a href="https://jekyllrb.com/docs/configuration/options/#global-configuration">Jekyll official documentation</a> states that the <code class="language-plaintext highlighter-rouge">exclude</code> setting supports globbing patterns.</p><blockquote><p>“This configuration option supports Ruby’s File.fnmatch filename globbing patterns to match multiple entries to exclude.”</p></blockquote><p>In other words, the root cause is not in the Chirpy theme but in Polyglot’s <code class="language-plaintext highlighter-rouge">relative_url_regex()</code> and <code class="language-plaintext highlighter-rouge">absolute_url_regex()</code> functions, so the fundamental solution is to modify them to prevent the problem.</p><p><del>Since this bug has not yet been fixed in Polyglot,</del> As described above, <a href="https://polyglot.untra.io/2025/09/20/polyglot.1.11.0/">this issue has been fixed since Polyglot version 1.11</a>. At the time the problem occurred, it could be worked around by forking the Polyglot repository with reference to <del><a href="https://hionpu.com/posts/github_blog_4#4-polyglot-%EC%9D%98%EC%A1%B4%EC%84%B1-%EB%AC%B8%EC%A0%9C">this blog post</a>(site is gone) and</del> <a href="https://github.com/untra/polyglot/issues/204#issuecomment-2143270322">the answer to the previous GitHub issue</a>, modifying the problematic parts as follows, and using it in place of the original Polyglot.</p><div file="(polyglot root path)/lib/jekyll/polyglot/patches/jekyll/site.rb" class="language-ruby highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
</pre><td class="rouge-code"><pre>    <span class="k">def</span> <span class="nf">relative_url_regex</span><span class="p">(</span><span class="n">disabled</span> <span class="o">=</span> <span class="kp">false</span><span class="p">)</span>
      <span class="n">regex</span> <span class="o">=</span> <span class="s1">''</span>
      <span class="k">unless</span> <span class="n">disabled</span>
        <span class="vi">@exclude</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span>
          <span class="n">escaped_x</span> <span class="o">=</span> <span class="no">Regexp</span><span class="p">.</span><span class="nf">escape</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
          <span class="n">regex</span> <span class="o">+=</span> <span class="s2">"(?!</span><span class="si">#{</span><span class="n">escaped_x</span><span class="si">}</span><span class="s2">)"</span>
        <span class="k">end</span>
        <span class="vi">@languages</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span>
          <span class="n">escaped_x</span> <span class="o">=</span> <span class="no">Regexp</span><span class="p">.</span><span class="nf">escape</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
          <span class="n">regex</span> <span class="o">+=</span> <span class="s2">"(?!</span><span class="si">#{</span><span class="n">escaped_x</span><span class="si">}</span><span class="se">\/</span><span class="s2">)"</span>
        <span class="k">end</span>
      <span class="k">end</span>
      <span class="n">start</span> <span class="o">=</span> <span class="n">disabled</span> <span class="p">?</span> <span class="s1">'ferh'</span> <span class="p">:</span> <span class="s1">'href'</span>
      <span class="sr">%r{</span><span class="si">#{</span><span class="n">start</span><span class="si">}</span><span class="sr">="?</span><span class="si">#{</span><span class="vi">@baseurl</span><span class="si">}</span><span class="sr">/((?:</span><span class="si">#{</span><span class="n">regex</span><span class="si">}</span><span class="sr">[^,'"</span><span class="se">\s</span><span class="sr">/?.]+</span><span class="se">\.</span><span class="sr">?)*(?:/[^</span><span class="se">\]\[</span><span class="sr">)("'</span><span class="se">\s</span><span class="sr">]*)?)"}</span>
    <span class="k">end</span>

    <span class="k">def</span> <span class="nf">absolute_url_regex</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">disabled</span> <span class="o">=</span> <span class="kp">false</span><span class="p">)</span>
      <span class="n">regex</span> <span class="o">=</span> <span class="s1">''</span>
      <span class="k">unless</span> <span class="n">disabled</span>
        <span class="vi">@exclude</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span>
          <span class="n">escaped_x</span> <span class="o">=</span> <span class="no">Regexp</span><span class="p">.</span><span class="nf">escape</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
          <span class="n">regex</span> <span class="o">+=</span> <span class="s2">"(?!</span><span class="si">#{</span><span class="n">escaped_x</span><span class="si">}</span><span class="s2">)"</span>
        <span class="k">end</span>
        <span class="vi">@languages</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span>
          <span class="n">escaped_x</span> <span class="o">=</span> <span class="no">Regexp</span><span class="p">.</span><span class="nf">escape</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
          <span class="n">regex</span> <span class="o">+=</span> <span class="s2">"(?!</span><span class="si">#{</span><span class="n">escaped_x</span><span class="si">}</span><span class="se">\/</span><span class="s2">)"</span>
        <span class="k">end</span>
      <span class="k">end</span>
      <span class="n">start</span> <span class="o">=</span> <span class="n">disabled</span> <span class="p">?</span> <span class="s1">'ferh'</span> <span class="p">:</span> <span class="s1">'href'</span>
      <span class="sr">%r{(?&lt;!hreflang="</span><span class="si">#{</span><span class="vi">@default_lang</span><span class="si">}</span><span class="sr">" )</span><span class="si">#{</span><span class="n">start</span><span class="si">}</span><span class="sr">="?</span><span class="si">#{</span><span class="n">url</span><span class="si">}#{</span><span class="vi">@baseurl</span><span class="si">}</span><span class="sr">/((?:</span><span class="si">#{</span><span class="n">regex</span><span class="si">}</span><span class="sr">[^,'"</span><span class="se">\s</span><span class="sr">/?.]+</span><span class="se">\.</span><span class="sr">?)*(?:/[^</span><span class="se">\]\[</span><span class="sr">)("'</span><span class="se">\s</span><span class="sr">]*)?)"}</span>
    <span class="k">end</span>
</pre></div></div><h3 id="2-replace-globbing-patterns-with-exact-filenames-in-the-chirpy-themes-_configyml-configuration-file">2. Replace globbing patterns with exact filenames in the Chirpy theme’s ‘_config.yml’ configuration file</h3><p>The proper and ideal solution would be for the above patch to be incorporated into the Polyglot mainstream. However, until then, you would need to use a forked version, which can be cumbersome as you would need to keep up with upstream Polyglot updates. Therefore, I used a different approach.</p><p>If you check the files in the root path of the <a href="https://github.com/cotes2020/jekyll-theme-chirpy">Chirpy theme repository</a> that match the patterns <code class="language-plaintext highlighter-rouge">"*.gem"</code>, <code class="language-plaintext highlighter-rouge">"*.gemspec"</code>, and <code class="language-plaintext highlighter-rouge">"*.config.js"</code>, there are only 3 files:</p><ul><li><code class="language-plaintext filepath highlighter-rouge">jekyll-theme-chirpy.gemspec</code><li><code class="language-plaintext filepath highlighter-rouge">purgecss.config.js</code><li><code class="language-plaintext filepath highlighter-rouge">rollup.config.js</code></ul><p>Therefore, you can delete the globbing patterns in the <code class="language-plaintext highlighter-rouge">exclude</code> section of the <code class="language-plaintext filepath highlighter-rouge">_config.yml</code> file and replace them as follows so that Polyglot can process them without issues.</p><div file="_config.yml" class="language-yml highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre><td class="rouge-code"><pre><span class="na">exclude</span><span class="pi">:</span> <span class="c1"># Modified with reference to https://github.com/untra/polyglot/issues/204</span>
  <span class="c1"># - "*.gem"</span>
  <span class="pi">-</span> <span class="s">jekyll-theme-chirpy.gemspec</span> <span class="c1"># - "*.gemspec"</span>
  <span class="pi">-</span> <span class="s">tools</span>
  <span class="pi">-</span> <span class="s">README.md</span>
  <span class="pi">-</span> <span class="s">LICENSE</span>
  <span class="pi">-</span> <span class="s">purgecss.config.js</span> <span class="c1"># - "*.config.js"</span>
  <span class="pi">-</span> <span class="s">rollup.config.js</span>
  <span class="pi">-</span> <span class="s">package*.json</span>
</pre></div></div><h2 id="modifying-the-search-function">Modifying the Search Function</h2><p>After completing the previous steps, almost all site functions worked satisfactorily as intended. However, I later discovered that the search bar located in the upper right corner of pages using the Chirpy theme could not index pages in languages other than <code class="language-plaintext highlighter-rouge">site.default_lang</code> (English in the case of this blog), and when searching from non-English pages, it still displayed links to English pages in the search results.</p><p>To understand the cause, let’s look at what files are involved in the search function and where the problem occurs.</p><h3 id="_layoutsdefaulthtml">‘_layouts/default.html’</h3><p>Looking at the <a href="https://github.com/cotes2020/jekyll-theme-chirpy/blob/master/_layouts/default.html"><code class="language-plaintext filepath highlighter-rouge">_layouts/default.html</code></a> file that forms the template for all pages on the blog, we can see that it loads the contents of <code class="language-plaintext filepath highlighter-rouge">search-results.html</code> and <code class="language-plaintext filepath highlighter-rouge">search-loader.html</code> inside the <code class="language-plaintext highlighter-rouge">&lt;body&gt;</code> element.</p><div file="\_layouts/default.html" class="language-liquid highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre><td class="rouge-code"><pre>  &lt;body&gt;
    <span class="cp">{%</span><span class="w"> </span><span class="nt">include</span><span class="w"> </span>sidebar.html<span class="w"> </span><span class="na">lang</span><span class="o">=</span><span class="nv">lang</span><span class="w"> </span><span class="cp">%}</span>

    &lt;div id="main-wrapper" class="d-flex justify-content-center"&gt;
      &lt;div class="container d-flex flex-column px-xxl-5"&gt;
        
        (...omitted...)

        <span class="cp">{%</span><span class="w"> </span><span class="nt">include_cached</span><span class="w"> </span><span class="nv">search-results</span><span class="p">.</span><span class="nv">html</span><span class="w"> </span><span class="na">lang</span><span class="o">=</span><span class="nv">lang</span><span class="w"> </span><span class="cp">%}</span>
      &lt;/div&gt;

      &lt;aside aria-label="Scroll to Top"&gt;
        &lt;button id="back-to-top" type="button" class="btn btn-lg btn-box-shadow"&gt;
          &lt;i class="fas fa-angle-up"&gt;&lt;/i&gt;
        &lt;/button&gt;
      &lt;/aside&gt;
    &lt;/div&gt;

    (...omitted...)

    <span class="cp">{%</span><span class="w"> </span><span class="nt">include_cached</span><span class="w"> </span><span class="nv">search-loader</span><span class="p">.</span><span class="nv">html</span><span class="w"> </span><span class="na">lang</span><span class="o">=</span><span class="nv">lang</span><span class="w"> </span><span class="cp">%}</span>
  &lt;/body&gt;
</pre></div></div><h3 id="_includessearch-resulthtml">‘_includes/search-result.html’</h3><p><a href="https://github.com/cotes2020/jekyll-theme-chirpy/blob/master/_includes/search-results.html"><code class="language-plaintext filepath highlighter-rouge">_includes/search-result.html</code></a> creates a <code class="language-plaintext highlighter-rouge">search-results</code> container to store search results for keywords entered in the search box.</p><div file="\_includes/search-result.html" class="language-html highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre><td class="rouge-code"><pre><span class="c">&lt;!-- The Search results --&gt;</span>

<span class="nt">&lt;div</span> <span class="na">id=</span><span class="s">"search-result-wrapper"</span> <span class="na">class=</span><span class="s">"d-flex justify-content-center d-none"</span><span class="nt">&gt;</span>
  <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"col-11 content"</span><span class="nt">&gt;</span>
    <span class="nt">&lt;div</span> <span class="na">id=</span><span class="s">"search-hints"</span><span class="nt">&gt;</span>
      {% include_cached trending-tags.html %}
    <span class="nt">&lt;/div&gt;</span>
    <span class="nt">&lt;div</span> <span class="na">id=</span><span class="s">"search-results"</span> <span class="na">class=</span><span class="s">"d-flex flex-wrap justify-content-center text-muted mt-3"</span><span class="nt">&gt;&lt;/div&gt;</span>
  <span class="nt">&lt;/div&gt;</span>
<span class="nt">&lt;/div&gt;</span>
</pre></div></div><h3 id="_includessearch-loaderhtml">‘_includes/search-loader.html’</h3><p><a href="https://github.com/cotes2020/jekyll-theme-chirpy/blob/master/_includes/search-loader.html"><code class="language-plaintext filepath highlighter-rouge">_includes/search-loader.html</code></a> is the core part that implements search based on the <a href="https://github.com/christian-fei/Simple-Jekyll-Search">Simple-Jekyll-Search</a> library. It executes JavaScript in the visitor’s browser to find matches for input keywords in the <a href="#assetsjsdatasearchjson"><code class="language-plaintext filepath highlighter-rouge">search.json</code></a> index file and returns the corresponding post links as <code class="language-plaintext highlighter-rouge">&lt;article&gt;</code> elements, operating on the client side.</p><div file="\_includes/search-loader.html" class="language-js highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
</pre><td class="rouge-code"><pre><span class="p">{</span><span class="o">%</span> <span class="nx">capture</span> <span class="nx">result_elem</span> <span class="o">%</span><span class="p">}</span>
  <span class="o">&lt;</span><span class="nx">article</span> <span class="kd">class</span><span class="o">=</span><span class="dl">"</span><span class="s2">px-1 px-sm-2 px-lg-4 px-xl-0</span><span class="dl">"</span><span class="o">&gt;</span>
    <span class="o">&lt;</span><span class="nx">header</span><span class="o">&gt;</span>
      <span class="o">&lt;</span><span class="nx">h2</span><span class="o">&gt;&lt;</span><span class="nx">a</span> <span class="nx">href</span><span class="o">=</span><span class="dl">"</span><span class="s2">{url}</span><span class="dl">"</span><span class="o">&gt;</span><span class="p">{</span><span class="nx">title</span><span class="p">}</span><span class="o">&lt;</span><span class="sr">/a&gt;&lt;/</span><span class="nx">h2</span><span class="o">&gt;</span>
      <span class="o">&lt;</span><span class="nx">div</span> <span class="kd">class</span><span class="o">=</span><span class="dl">"</span><span class="s2">post-meta d-flex flex-column flex-sm-row text-muted mt-1 mb-1</span><span class="dl">"</span><span class="o">&gt;</span>
        <span class="p">{</span><span class="nx">categories</span><span class="p">}</span>
        <span class="p">{</span><span class="nx">tags</span><span class="p">}</span>
      <span class="o">&lt;</span><span class="sr">/div</span><span class="err">&gt;
</span>    <span class="o">&lt;</span><span class="sr">/header</span><span class="err">&gt;
</span>    <span class="o">&lt;</span><span class="nx">p</span><span class="o">&gt;</span><span class="p">{</span><span class="nx">snippet</span><span class="p">}</span><span class="o">&lt;</span><span class="sr">/p</span><span class="err">&gt;
</span>  <span class="o">&lt;</span><span class="sr">/article</span><span class="err">&gt;
</span><span class="p">{</span><span class="o">%</span> <span class="nx">endcapture</span> <span class="o">%</span><span class="p">}</span>

<span class="p">{</span><span class="o">%</span> <span class="nx">capture</span> <span class="nx">not_found</span> <span class="o">%</span><span class="p">}</span><span class="o">&lt;</span><span class="nx">p</span> <span class="kd">class</span><span class="o">=</span><span class="dl">"</span><span class="s2">mt-5</span><span class="dl">"</span><span class="o">&gt;</span><span class="p">{{</span> <span class="nx">site</span><span class="p">.</span><span class="nx">data</span><span class="p">.</span><span class="nx">locales</span><span class="p">[</span><span class="nx">include</span><span class="p">.</span><span class="nx">lang</span><span class="p">].</span><span class="nx">search</span><span class="p">.</span><span class="nx">no_results</span> <span class="p">}}</span><span class="o">&lt;</span><span class="sr">/p&gt;{% endcapture %</span><span class="err">}
</span>
<span class="o">&lt;</span><span class="nx">script</span><span class="o">&gt;</span>
  <span class="p">{</span><span class="o">%</span> <span class="nx">comment</span> <span class="o">%</span><span class="p">}</span> <span class="nl">Note</span><span class="p">:</span> <span class="nx">dependent</span> <span class="nx">library</span> <span class="nx">will</span> <span class="nx">be</span> <span class="nx">loaded</span> <span class="k">in</span> <span class="s2">`js-selector.html`</span> <span class="p">{</span><span class="o">%</span> <span class="nx">endcomment</span> <span class="o">%</span><span class="p">}</span>
  <span class="nb">document</span><span class="p">.</span><span class="nf">addEventListener</span><span class="p">(</span><span class="dl">'</span><span class="s1">DOMContentLoaded</span><span class="dl">'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="nc">SimpleJekyllSearch</span><span class="p">({</span>
      <span class="na">searchInput</span><span class="p">:</span> <span class="nb">document</span><span class="p">.</span><span class="nf">getElementById</span><span class="p">(</span><span class="dl">'</span><span class="s1">search-input</span><span class="dl">'</span><span class="p">),</span>
      <span class="na">resultsContainer</span><span class="p">:</span> <span class="nb">document</span><span class="p">.</span><span class="nf">getElementById</span><span class="p">(</span><span class="dl">'</span><span class="s1">search-results</span><span class="dl">'</span><span class="p">),</span>
      <span class="na">json</span><span class="p">:</span> <span class="dl">'</span><span class="s1">{{ </span><span class="dl">'</span><span class="o">/</span><span class="nx">assets</span><span class="o">/</span><span class="nx">js</span><span class="o">/</span><span class="nx">data</span><span class="o">/</span><span class="nx">search</span><span class="p">.</span><span class="nx">json</span><span class="dl">'</span><span class="s1"> | relative_url }}</span><span class="dl">'</span><span class="p">,</span>
      <span class="na">searchResultTemplate</span><span class="p">:</span> <span class="dl">'</span><span class="s1">{{ result_elem | strip_newlines }}</span><span class="dl">'</span><span class="p">,</span>
      <span class="na">noResultsText</span><span class="p">:</span> <span class="dl">'</span><span class="s1">{{ not_found }}</span><span class="dl">'</span><span class="p">,</span>
      <span class="na">templateMiddleware</span><span class="p">:</span> <span class="kd">function</span><span class="p">(</span><span class="nx">prop</span><span class="p">,</span> <span class="nx">value</span><span class="p">,</span> <span class="nx">template</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if </span><span class="p">(</span><span class="nx">prop</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">categories</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span>
          <span class="k">if </span><span class="p">(</span><span class="nx">value</span> <span class="o">===</span> <span class="dl">''</span><span class="p">)</span> <span class="p">{</span>
            <span class="k">return</span> <span class="s2">`</span><span class="p">${</span><span class="nx">value</span><span class="p">}</span><span class="s2">`</span><span class="p">;</span>
          <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="k">return</span> <span class="s2">`&lt;div class="me-sm-4"&gt;&lt;i class="far fa-folder fa-fw"&gt;&lt;/i&gt;</span><span class="p">${</span><span class="nx">value</span><span class="p">}</span><span class="s2">&lt;/div&gt;`</span><span class="p">;</span>
          <span class="p">}</span>
        <span class="p">}</span>

        <span class="k">if </span><span class="p">(</span><span class="nx">prop</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">tags</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span>
          <span class="k">if </span><span class="p">(</span><span class="nx">value</span> <span class="o">===</span> <span class="dl">''</span><span class="p">)</span> <span class="p">{</span>
            <span class="k">return</span> <span class="s2">`</span><span class="p">${</span><span class="nx">value</span><span class="p">}</span><span class="s2">`</span><span class="p">;</span>
          <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="k">return</span> <span class="s2">`&lt;div&gt;&lt;i class="fa fa-tag fa-fw"&gt;&lt;/i&gt;</span><span class="p">${</span><span class="nx">value</span><span class="p">}</span><span class="s2">&lt;/div&gt;`</span><span class="p">;</span>
          <span class="p">}</span>
        <span class="p">}</span>
      <span class="p">}</span>
    <span class="p">});</span>
  <span class="p">});</span>
<span class="o">&lt;</span><span class="sr">/script</span><span class="err">&gt;
</span></pre></div></div><h3 id="assetsjsdatasearchjson">‘/assets/js/data/search.json’</h3><div file="/assets/js/data/search.json" class="language-liquid highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre><td class="rouge-code"><pre>---
layout: compress
swcache: true
---

[
  <span class="cp">{%</span><span class="w"> </span><span class="nt">for</span><span class="w"> </span><span class="nv">post</span><span class="w"> </span><span class="nt">in</span><span class="w"> </span><span class="nv">site</span><span class="p">.</span><span class="nv">posts</span><span class="w"> </span><span class="cp">%}</span>
  {
    "title": <span class="cp">{{</span><span class="w"> </span><span class="nv">post</span><span class="p">.</span><span class="nv">title</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">jsonify</span><span class="w"> </span><span class="cp">}}</span>,
    "url": <span class="cp">{{</span><span class="w"> </span><span class="nv">post</span><span class="p">.</span><span class="nv">url</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">relative_url</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">jsonify</span><span class="w"> </span><span class="cp">}}</span>,
    "categories": <span class="cp">{{</span><span class="w"> </span><span class="nv">post</span><span class="p">.</span><span class="nv">categories</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">join</span><span class="p">:</span><span class="w"> </span><span class="s1">', '</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">jsonify</span><span class="w"> </span><span class="cp">}}</span>,
    "tags": <span class="cp">{{</span><span class="w"> </span><span class="nv">post</span><span class="p">.</span><span class="nv">tags</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">join</span><span class="p">:</span><span class="w"> </span><span class="s1">', '</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">jsonify</span><span class="w"> </span><span class="cp">}}</span>,
    "date": "<span class="cp">{{</span><span class="w"> </span><span class="nv">post</span><span class="p">.</span><span class="nv">date</span><span class="w"> </span><span class="cp">}}</span>",
    <span class="cp">{%</span><span class="w"> </span><span class="nt">include</span><span class="w"> </span>no-linenos.html<span class="w"> </span><span class="na">content</span><span class="o">=</span><span class="nv">post</span><span class="p">.</span><span class="nv">content</span><span class="w"> </span><span class="cp">%}</span>
    <span class="cp">{%</span><span class="w"> </span><span class="nt">assign</span><span class="w"> </span><span class="nv">_content</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">content</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">strip_html</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">strip_newlines</span><span class="w"> </span><span class="cp">%}</span>
    "snippet": <span class="cp">{{</span><span class="w"> </span><span class="nv">_content</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">truncate</span><span class="p">:</span><span class="w"> </span><span class="mi">200</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">jsonify</span><span class="w"> </span><span class="cp">}}</span>,
    "content": <span class="cp">{{</span><span class="w"> </span><span class="nv">_content</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="nf">jsonify</span><span class="w"> </span><span class="cp">}}</span>
  }<span class="cp">{%</span><span class="w"> </span><span class="nt">unless</span><span class="w"> </span><span class="nb">forloop.last</span><span class="w"> </span><span class="cp">%}</span>,<span class="cp">{%</span><span class="w"> </span><span class="nt">endunless</span><span class="w"> </span><span class="cp">%}</span>
  <span class="cp">{%</span><span class="w"> </span><span class="nt">endfor</span><span class="w"> </span><span class="cp">%}</span>
]
</pre></div></div><p>This file uses Jekyll’s Liquid syntax to define a JSON file containing the title, URL, category and tag information, creation date, the first 200 characters of the content as a snippet, and the full content of all posts on the site.</p><h3 id="search-function-structure-and-problem-identification">Search Function Structure and Problem Identification</h3><p>To summarize, when hosting the Chirpy theme on GitHub Pages, the search function operates through the following process:</p><pre><code class="language-mermaid">stateDiagram
  state "Changes" as CH
  state "Build start" as BLD
  state "Create search.json" as IDX
  state "Static Website" as DEP
  state "In Test" as TST
  state "Search Loader" as SCH
  state "Results" as R
    
  [*] --&gt; CH: Make Changes
  CH --&gt; BLD: Commit &amp; Push origin
  BLD --&gt; IDX: jekyll build
  IDX --&gt; TST: Build Complete
  TST --&gt; CH: Error Detected
  TST --&gt; DEP: Deploy
  DEP --&gt; SCH: Search Input
  SCH --&gt; R: Return Results
  R --&gt; [*]
</code></pre><p>I confirmed that <code class="language-plaintext filepath highlighter-rouge">search.json</code> is created for each language by Polyglot as follows:</p><ul><li><code class="language-plaintext filepath highlighter-rouge">/assets/js/data/search.json</code><li><code class="language-plaintext filepath highlighter-rouge">/ko/assets/js/data/search.json</code><li><code class="language-plaintext filepath highlighter-rouge">/ja/assets/js/data/search.json</code><li><code class="language-plaintext filepath highlighter-rouge">/zh-TW/assets/js/data/search.json</code><li><code class="language-plaintext filepath highlighter-rouge">/es/assets/js/data/search.json</code><li><code class="language-plaintext filepath highlighter-rouge">/pt-BR/assets/js/data/search.json</code><li><code class="language-plaintext filepath highlighter-rouge">/fr/assets/js/data/search.json</code><li><code class="language-plaintext filepath highlighter-rouge">/de/assets/js/data/search.json</code></ul><p>Therefore, the problematic part is the “Search Loader”. The issue of non-English pages not being searchable occurs because <code class="language-plaintext filepath highlighter-rouge">_includes/search-loader.html</code> statically loads only the English index file (<code class="language-plaintext filepath highlighter-rouge">/assets/js/data/search.json</code>) regardless of the language of the page being visited.</p><blockquote class="prompt-warning"><ul><li>However, unlike markdown or html format files, for JSON files, Polyglot wrappers for Jekyll-provided variables like <code class="language-plaintext highlighter-rouge">post.title</code>, <code class="language-plaintext highlighter-rouge">post.content</code> work, but the <a href="https://github.com/untra/polyglot?tab=readme-ov-file#relativized-local-urls">Relativized Local Urls</a> feature does not seem to work.<li>Similarly, I confirmed during testing that within JSON file templates, it’s not possible to access <a href="https://github.com/untra/polyglot?tab=readme-ov-file#features">additional liquid tags provided by Polyglot</a> such as <code class="language-plaintext highlighter-rouge">{{ site.default_lang }}</code>, <code class="language-plaintext highlighter-rouge">{{ site.active_lang }}</code> beyond the variables provided by Jekyll.</ul><p>Therefore, while values like <code class="language-plaintext highlighter-rouge">title</code>, <code class="language-plaintext highlighter-rouge">snippet</code>, and <code class="language-plaintext highlighter-rouge">content</code> in the index file are generated differently for each language, the <code class="language-plaintext highlighter-rouge">url</code> value returns the default path without considering the language, and appropriate handling needs to be added to the “Search Loader” part.</p></blockquote><h3 id="solution">Solution</h3><p>To solve this, modify the content of <code class="language-plaintext filepath highlighter-rouge">_includes/search-loader.html</code> as follows:</p><div file="\_includes/search-loader.html" class="language-plaintext highlighter-rouge"><div class="highlight">class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
</pre><td class="rouge-code"><pre>{% capture result_elem %}
  &lt;article class="px-1 px-sm-2 px-lg-4 px-xl-0"&gt;
    &lt;header&gt;
      {% if site.active_lang != site.default_lang %}
      &lt;h2&gt;&lt;a {% static_href %}href="/{{ site.active_lang }}{url}"{% endstatic_href %}&gt;{title}&lt;/a&gt;&lt;/h2&gt;
      {% else %}
      &lt;h2&gt;&lt;a href="{url}"&gt;{title}&lt;/a&gt;&lt;/h2&gt;
      {% endif %}

(...omitted...)

&lt;script&gt;
  {% comment %} Note: dependent library will be loaded in `js-selector.html` {% endcomment %}
  document.addEventListener('DOMContentLoaded', () =&gt; {
    {% assign search_path = '/assets/js/data/search.json' %}
    {% if site.active_lang != site.default_lang %}
      {% assign search_path = '/' | append: site.active_lang | append: search_path %}
    {% endif %}
    
    SimpleJekyllSearch({
      searchInput: document.getElementById('search-input'),
      resultsContainer: document.getElementById('search-results'),
      json: '{{ search_path | relative_url }}',
      searchResultTemplate: '{{ result_elem | strip_newlines }}',

(...omitted)
</pre></div></div><ul><li>I modified the liquid syntax in the <code class="language-plaintext highlighter-rouge">{% capture result_elem %}</code> section to add the prefix <code class="language-plaintext highlighter-rouge">"/{{ site.active_lang }}"</code> before the post URL loaded from the JSON file when <code class="language-plaintext highlighter-rouge">site.active_lang</code> (current page language) is different from <code class="language-plaintext highlighter-rouge">site.default_lang</code> (site default language).<li>Similarly, I modified the <code class="language-plaintext highlighter-rouge">&lt;script&gt;</code> section to compare the current page language with the site default language during the build process, and set <code class="language-plaintext highlighter-rouge">search_path</code> to the default path (<code class="language-plaintext filepath highlighter-rouge">/assets/js/data/search.json</code>) if they are the same, or to the language-specific path (e.g., <code class="language-plaintext filepath highlighter-rouge">/ko/assets/js/data/search.json</code>) if they are different.</ul><p>After making these modifications and rebuilding the website, I confirmed that search results are displayed correctly for each language.</p><blockquote class="prompt-tip"><p>Since <code class="language-plaintext highlighter-rouge">{url}</code> is a placeholder for the URL value that will be read from the JSON file by JavaScript during search execution, and not a valid URL at build time, it is not recognized as a localization target by Polyglot and must be handled directly. The problem is that the resulting template, <code class="language-plaintext highlighter-rouge">"/{{ site.active_lang }}{url}"</code>, is recognized as a relative URL at build time. Although localization has already been completed, Polyglot is unaware of this and attempts to perform it again (e.g., <code class="language-plaintext filepath highlighter-rouge">"/ko/ko/posts/example-post"</code>). To prevent this, I specified the <a href="https://github.com/untra/polyglot?tab=readme-ov-file#disabling-url-relativizing"><code class="language-plaintext highlighter-rouge">{% static_href %}</code> tag</a>.</p></blockquote>]]> </content> </entry> <entry><title xml:lang="en">Homogeneous Linear ODEs of Second Order</title><link href="https://www.yunseo.kim/posts/homogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="en" /><link href="https://www.yunseo.kim/ko/posts/homogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="ko" /><link href="https://www.yunseo.kim/ja/posts/homogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="ja" /><link href="https://www.yunseo.kim/zh-TW/posts/homogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="zh-TW" /><link href="https://www.yunseo.kim/es/posts/homogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="es" /><link href="https://www.yunseo.kim/pt-BR/posts/homogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="pt-BR" /><link href="https://www.yunseo.kim/fr/posts/homogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="fr" /><link href="https://www.yunseo.kim/de/posts/homogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="de" /><link href="https://www.yunseo.kim/pl/posts/homogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="pl" /><link href="https://www.yunseo.kim/cs/posts/homogeneous-linear-odes-of-second-order/" rel="alternate" type="text/html" hreflang="cs" /><published>2025-01-13T00:00:00+09:00</published> <updated>2025-07-09T19:24:14+09:00</updated> <id>https://www.yunseo.kim/posts/homogeneous-linear-odes-of-second-order/</id> <author> <name>Yunseo Kim</name> </author> <category term="Mathematics" /> <category term="Differential Equation" /> <summary xml:lang="en">Learn the definition and properties of second-order linear ordinary differential equations, focusing on the superposition principle for homogeneous linear ODEs and the related concept of a basis.</summary> <content type="html" xml:lang="en"> <![CDATA[<p>Learn the definition and properties of second-order linear ordinary differential equations, focusing on the superposition principle for homogeneous linear ODEs and the related concept of a basis.</p><em><p>* Mathematical equations and diagrams included in posts may not display properly when viewed with a feed reader.</p></em><h2 id="tldr">TL;DR</h2><blockquote class="prompt-info"><ul><li><strong>Standard form</strong> of a second-order linear ODE: $y^{\prime\prime} + p(x)y^{\prime} + q(x)y = r(x)$<ul><li><strong>Coefficients</strong>: Functions $p$, $q$<li><strong>Input</strong>: $r(x)$<li><strong>Output</strong> or <strong>response</strong>: $y(x)$</ul><li>Homogeneous and Nonhomogeneous<ul><li><strong>Homogeneous</strong>: When $r(x)\equiv0$ in the standard form.<li><strong>Nonhomogeneous</strong>: When $r(x)\not\equiv 0$ in the standard form.</ul><li><strong>Superposition principle</strong>: For a <u>homogeneous</u> linear ODE $y^{\prime\prime} + p(x)y^{\prime} + q(x)y = 0$, any linear combination of two of its solutions on an open interval $I$ is also a solution of the given equation. That is, the sum and constant multiples of any solutions to the given homogeneous linear ODE are also solutions.<li><strong>Basis</strong> or <strong>fundamental system</strong>: A pair of linearly independent solutions $(y_1, y_2)$ of a homogeneous linear ODE on an interval $I$.<li><strong>Reduction of order</strong>: If one solution to a second-order homogeneous ODE is known, a second, linearly independent solution (i.e., a basis) can be found by solving a first-order ODE. This method is called reduction of order.<li>Applications of reduction of order: A general second-order ODE $F(x, y, y^\prime, y^{\prime\prime})=0$, whether linear or nonlinear, can be reduced to a first-order ODE using reduction of order in the following cases:<ul><li>$y$ does not appear explicitly.<li>$x$ does not appear explicitly.<li>The equation is homogeneous linear and one solution is already known.</ul></ul></blockquote><h2 id="prerequisites">Prerequisites</h2><ul><li><a href="/posts/Basic-Concepts-of-Modeling/">Basic Concepts of Modeling</a><li><a href="/posts/Separation-of-Variables/">Separation of Variables</a><li><a href="/posts/Solution-of-First-Order-Linear-ODE/">Solution of First-Order Linear ODEs</a></ul><h2 id="second-order-linear-odes">Second-Order Linear ODEs</h2><p>A second-order ordinary differential equation is called <strong>linear</strong> if it can be written in the form</p>\[y^{\prime\prime} + p(x)y^{\prime} + q(x)y = r(x) \label{eqn:standard_form}\tag{1}\]<p>and <strong>nonlinear</strong> otherwise.</p><p>When $p$, $q$, and $r$ are functions of any $x$, this equation is linear with respect to $y$ and its derivatives.</p><p>The form of Eq. ($\ref{eqn:standard_form}$) is called the <strong>standard form</strong> of a second-order linear ODE. If the first term of a given second-order linear ODE is $f(x)y^{\prime\prime}$, we can obtain the standard form by dividing both sides of the equation by $f(x)$.</p><p>The functions $p$ and $q$ are called <strong>coefficients</strong>, $r(x)$ is the <strong>input</strong>, and $y(x)$ is the <strong>output</strong> or the <strong>response</strong> to the input and initial conditions.</p><h3 id="homogeneous-second-order-linear-odes">Homogeneous Second-Order Linear ODEs</h3><p>Let $J$ be an interval $a&lt;x&lt;b$ where we want to solve Eq. ($\ref{eqn:standard_form}$). If $r(x)\equiv 0$ for the interval $J$ in Eq. ($\ref{eqn:standard_form}$), then</p>\[y^{\prime\prime} + p(x)y^{\prime} + q(x)y = 0 \label{eqn:homogeneous_linear_ode}\tag{2}\]<p>and this is called <strong>homogeneous</strong>.</p><h2 id="nonhomogeneous-linear-odes">Nonhomogeneous Linear ODEs</h2><p>If $r(x)\not\equiv 0$ in the interval $J$, the equation is called <strong>nonhomogeneous</strong>.</p><h2 id="superposition-principle">Superposition Principle</h2><p>A function of the form \(y = c_1y_1 + c_2y_2 \quad \text{(where }c_1, c_2\text{ are arbitrary constants)}\tag{3}\) is called a <strong>linear combination</strong> of $y_1$ and $y_2$.</p><p>The following holds true.</p><blockquote class="prompt-info"><p><strong>Superposition principle</strong><br /> For the homogeneous linear ODE ($\ref{eqn:homogeneous_linear_ode}$), any linear combination of two of its solutions on an open interval $I$ is also a solution of Eq. ($\ref{eqn:homogeneous_linear_ode}$). That is, the sum and constant multiples of any solutions to the given homogeneous linear ODE are also solutions.</p></blockquote><h3 id="proof">Proof</h3><p>Let $y_1$ and $y_2$ be solutions of Eq. ($\ref{eqn:homogeneous_linear_ode}$) on an interval $I$. Substituting $y=c_1y_1+c_2y_2$ into Eq. ($\ref{eqn:homogeneous_linear_ode}$) gives</p>\[\begin{align*} y^{\prime\prime} + py^{\prime} + qy &amp;= (c_1y_1+c_2y_2)^{\prime\prime} + p(c_1y_1+c_2y_2)^{\prime} + q(c_1y_1+c_2y_2) \\ &amp;= c_1y_1^{\prime\prime} + c_2y_2^{\prime\prime} + p(c_1y_1^{\prime} + c_2y_2^{\prime}) + q(c_1y_1+c_2y_2) \\ &amp;= c_1(y_1^{\prime\prime} + py_1^{\prime} + qy_1) + c_2(y_2^{\prime\prime} + py_2^{\prime} + qy_2) \\ &amp;= 0 \end{align*}\]<p>which becomes an identity. Therefore, $y$ is a solution of Eq. ($\ref{eqn:homogeneous_linear_ode}$) on the interval $I$. $\blacksquare$</p><blockquote class="prompt-warning"><p>Note that the superposition principle holds only for homogeneous linear ODEs and not for nonhomogeneous linear or nonlinear ODEs.</p></blockquote><h2 id="basis-and-general-solution">Basis and General Solution</h2><h3 id="review-of-key-concepts-from-first-order-odes">Review of Key Concepts from First-Order ODEs</h3><p>As we saw previously in <a href="/posts/Basic-Concepts-of-Modeling/">Basic Concepts of Modeling</a>, an Initial Value Problem for a first-order ODE consists of the ODE and an initial condition $y(x_0)=y_0$. The initial condition is necessary to determine the arbitrary constant $c$ in the general solution of the given ODE, and the resulting solution is called a particular solution. Let’s now extend these concepts to second-order ODEs.</p><h3 id="initial-value-problem-and-initial-conditions">Initial Value Problem and Initial Conditions</h3><p>An <strong>initial value problem</strong> for the second-order homogeneous ODE ($\ref{eqn:homogeneous_linear_ode}$) consists of the given ODE ($\ref{eqn:homogeneous_linear_ode}$) and two <strong>initial conditions</strong></p>\[y(x_0) = K_0, \quad y^{\prime}(x_0)=K_1 \label{eqn:init_conditions}\tag{4}\]<p>These conditions are needed to determine the two arbitrary constants $c_1$ and $c_2$ in the <strong>general solution</strong> of the ODE</p>\[y = c_1y_1 + c_2y_2 \label{eqn:general_sol}\tag{5}\]<h3 id="linear-independence-and-dependence">Linear Independence and Dependence</h3><p>Let’s briefly discuss the concepts of linear independence and dependence. This is necessary to define a basis later.<br /> Two functions $y_1$ and $y_2$ are said to be <strong>linearly independent</strong> on an interval $I$ where they are defined if for all points in $I$,</p>\[k_1y_1(x) + k_2y_2(x) = 0 \Leftrightarrow k_1=0\text{ and }k_2=0 \label{eqn:linearly_independent}\tag{6}\]<p>Otherwise, $y_1$ and $y_2$ are said to be <strong>linearly dependent</strong>.</p><p>If $y_1$ and $y_2$ are linearly dependent (i.e., statement ($\ref{eqn:linearly_independent}$) is not true), then with $k_1 \neq 0$ or $k_2 \neq 0$, we can divide both sides of the equation in ($\ref{eqn:linearly_independent}$) to write</p>\[y_1 = - \frac{k_2}{k_1}y_2 \quad \text{or} \quad y_2 = - \frac{k_1}{k_2}y_2\]<p>which shows that $y_1$ and $y_2$ are proportional.</p><h3 id="basis-general-solution-and-particular-solution">Basis, General Solution, and Particular Solution</h3><p>Returning to our discussion, for Eq. ($\ref{eqn:general_sol}$) to be a general solution, $y_1$ and $y_2$ must be solutions to Eq. ($\ref{eqn:homogeneous_linear_ode}$) and also be linearly independent (not proportional to each other) on the interval $I$. A pair of solutions $(y_1, y_2)$ of Eq. ($\ref{eqn:homogeneous_linear_ode}$) that are linearly independent on an interval $I$ is called a <strong>basis</strong> or a <strong>fundamental system</strong> of solutions for Eq. ($\ref{eqn:homogeneous_linear_ode}$) on $I$.</p><p>By using the initial conditions to determine the two constants $c_1$ and $c_2$ in the general solution ($\ref{eqn:general_sol}$), we obtain a unique solution that passes through the point $(x_0, K_0)$ and has a slope of $K_1$ at that point. This is called a <strong>particular solution</strong> of the ODE ($\ref{eqn:homogeneous_linear_ode}$).</p><p>If Eq. ($\ref{eqn:homogeneous_linear_ode}$) is continuous on an open interval $I$, it is guaranteed to have a general solution, and this general solution includes all possible particular solutions. In this case, Eq. ($\ref{eqn:homogeneous_linear_ode}$) does not have a singular solution that cannot be obtained from the general solution.</p><h2 id="reduction-of-order">Reduction of Order</h2><p>If we can find one solution to a second-order homogeneous ODE, we can find a second, linearly independent solution—that is, a basis—by solving a first-order ODE as follows. This method is called <strong>reduction of order</strong>.</p><p>For a second-order homogeneous ODE in <u>standard form with $y^{\prime\prime}$, not $f(x)y^{\prime\prime}$</u>,</p>\[y^{\prime\prime} + p(x)y^\prime + q(x)y = 0\]<p>let’s assume we know one solution $y_1$ on an open interval $I$.</p><p>Now, let’s set the second solution we are looking for as $y_2 = uy_1$, and substitute</p>\[\begin{align*} y &amp;= y_2 = uy_1, \\ y^{\prime} &amp;= y_2^{\prime} = u^{\prime}y_1 + uy_1^{\prime}, \\ y^{\prime\prime} &amp;= y_2^{\prime\prime} = u^{\prime\prime}y_1 + 2u^{\prime}y_1^{\prime} + uy_1^{\prime\prime} \end{align*}\]<p>into the equation to get</p>\[(u^{\prime\prime}y_1 + 2u^{\prime}y_1^{\prime} + uy_1^{\prime\prime}) + p(u^{\prime}y_1 + uy_1^{\prime}) + quy_1 = 0 \tag{7}\]<p>Grouping the terms by $u^{\prime\prime}$, $u^{\prime}$, and $u$ gives</p>\[y_1u^{\prime\prime} + (py_1+2y_1^{\prime})u^{\prime} + (y_1^{\prime\prime} + py_1^{\prime} + qy_1)u = 0\]<p>However, since $y_1$ is a solution to the given equation, the expression in the last parenthesis is $0$. Thus, the term with $u$ disappears, leaving an ODE in terms of $u^{\prime}$ and $u^{\prime\prime}$. Dividing the remaining ODE by $y_1$ and setting $u^{\prime}=U$ and $u^{\prime\prime}=U^{\prime}$, we obtain the following first-order ODE.</p>\[U^{\prime} + \left(\frac{2y_1^{\prime}}{y_1} + p \right) U = 0.\]<p>Using <a href="/posts/Separation-of-Variables/">Separation of Variables</a> and integrating,</p>\[\begin{align*} \frac{dU}{U} &amp;= - \left(\frac{2y_1^{\prime}}{y_1} + p \right) dx \\ \ln|U| &amp;= -2\ln|y_1| - \int p dx \end{align*}\]<p>and taking the exponential of both sides, we finally get</p>\[U = \frac{1}{y_1^2}e^{-\int p dx} \tag{8}\]<p>Since we set $U=u^{\prime}$, we have $u=\int U dx$. The second solution $y_2$ we are looking for is</p>\[y_2 = uy_1 = y_1 \int U dx\]<p>Since $\cfrac{y_2}{y_1} = u = \int U dx$ cannot be a constant as long as $U&gt;0$, $y_1$ and $y_2$ form a basis of solutions.</p><h3 id="applications-of-reduction-of-order">Applications of Reduction of Order</h3><p>A general second-order ODE $F(x, y, y^\prime, y^{\prime\prime})=0$, whether linear or nonlinear, can be reduced to a first-order ODE using reduction of order when $y$ does not appear explicitly, when $x$ does not appear explicitly, or, as seen before, when the equation is homogeneous linear and one solution is already known.</p><h4 id="case-where-y-does-not-appear-explicitly">Case where $y$ does not appear explicitly</h4><p>In $F(x, y^\prime, y^{\prime\prime})=0$, setting $z=y^{\prime}$ reduces the equation to a first-order ODE in $z$, $F(x, z, z^{\prime})$.</p><h4 id="case-where-x-does-not-appear-explicitly">Case where $x$ does not appear explicitly</h4><p>In $F(y, y^\prime, y^{\prime\prime})=0$, setting $z=y^{\prime}$ gives $y^{\prime\prime} = \cfrac{d y^{\prime}}{dx} = \cfrac{d y^{\prime}}{dy}\cfrac{dy}{dx} = \cfrac{dz}{dy}z$. This reduces the equation to a first-order ODE in $z$, $F(y,z,z^\prime)$, where $y$ takes the role of the independent variable $x$.</p>]]> </content> </entry> </feed>
