<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>script on Alán's blog</title><link>https://quasimorphic.com/categories/script/</link><description>Recent content in script on Alán's blog</description><generator>Hugo</generator><language>en-uk</language><lastBuildDate>Wed, 22 Oct 2025 20:34:00 -0400</lastBuildDate><atom:link href="https://quasimorphic.com/categories/script/index.xml" rel="self" type="application/rss+xml"/><item><title>Calculate the cumulative sum of a column using DuckDB</title><link>https://quasimorphic.com/archive/duckdb_cumsum/</link><pubDate>Wed, 22 Oct 2025 20:34:00 -0400</pubDate><guid>https://quasimorphic.com/archive/duckdb_cumsum/</guid><description>&lt;p>Duckdb, the (tabular) data exploration tool I use supports window operations. I recently discovered that it can also perform cumulative sums in a very efficient manner.&lt;/p>
&lt;p>Let us generate a toy dataset where we want to calculate the sum of one column relative to the order of another one.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e">-- seeding for reproducibility, creating a table to hide output
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">CREATE&lt;/span> &lt;span style="color:#66d9ef">OR&lt;/span> &lt;span style="color:#66d9ef">REPLACE&lt;/span> &lt;span style="color:#66d9ef">TABLE&lt;/span> seed &lt;span style="color:#66d9ef">AS&lt;/span> &lt;span style="color:#66d9ef">SELECT&lt;/span> SETSEED(&lt;span style="color:#ae81ff">0&lt;/span>.&lt;span style="color:#ae81ff">1&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Create a mock dataset with two integer columns
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">CREATE&lt;/span> &lt;span style="color:#66d9ef">OR&lt;/span> &lt;span style="color:#66d9ef">REPLACE&lt;/span> &lt;span style="color:#66d9ef">TABLE&lt;/span> my_table &lt;span style="color:#66d9ef">AS&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">#&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span> &lt;span style="color:#66d9ef">AS&lt;/span> column_1,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">CAST&lt;/span>(FLOOR(RANDOM() &lt;span style="color:#f92672">*&lt;/span> &lt;span style="color:#ae81ff">100&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> INT) &lt;span style="color:#66d9ef">AS&lt;/span> column_2
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span> generate_series(&lt;span style="color:#ae81ff">1&lt;/span>, &lt;span style="color:#ae81ff">10&lt;/span>); &lt;span style="color:#75715e">-- This generates 10 rows
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span> &lt;span style="color:#f92672">*&lt;/span> &lt;span style="color:#66d9ef">FROM&lt;/span> my_table;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- We write it to a csv for future use
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">COPY&lt;/span> my_table &lt;span style="color:#66d9ef">TO&lt;/span> my_table.csv;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>┌──────────┬──────────┐
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ column_1 │ column_2 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ int64 │ int32 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├──────────┼──────────┤
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 1 │ 27 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 2 │ 45 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 3 │ 2 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 4 │ 84 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 5 │ 84 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 6 │ 26 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 7 │ 18 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 8 │ 65 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 9 │ 97 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 10 │ 11 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├──────────┴──────────┤
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 10 rows 2 columns │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>└─────────────────────┘
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>If we wanted to calculate the distribution of the cumulative sum of the table we could use the &lt;code>OVER&lt;/code> clause to perform the sum of &lt;code>column_2&lt;/code> in the order defined by &lt;code>column_1&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span> &lt;span style="color:#f92672">*&lt;/span>, &lt;span style="color:#66d9ef">sum&lt;/span>(column_2) OVER (&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">by&lt;/span> column_1) &lt;span style="color:#66d9ef">AS&lt;/span> cumulative_sum &lt;span style="color:#66d9ef">FROM&lt;/span> my_table
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>┌──────────┬──────────┬────────────────┐
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ column_1 │ column_2 │ cumulative_sum │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ int64 │ int32 │ int128 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├──────────┼──────────┼────────────────┤
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 1 │ 27 │ 27 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 2 │ 45 │ 72 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 3 │ 2 │ 74 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 4 │ 84 │ 158 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 5 │ 84 │ 242 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 6 │ 26 │ 268 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 7 │ 18 │ 286 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 8 │ 65 │ 351 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 9 │ 97 │ 448 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 10 │ 11 │ 459 │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├──────────┴──────────┴────────────────┤
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│ 10 rows 3 columns │
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>└──────────────────────────────────────┘
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The cumulative sum can be pretty handy to get a general notion of a distribution. As a bonus tip, I&amp;rsquo;ll show how to use duckdb in a one-liner to plot the data
directly in a terminal by using &lt;a href="https://gnuplotting.org/">gnuplot&lt;/a>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-shell" data-lang="shell">&lt;span style="display:flex;">&lt;span>duckdb -csv -c &lt;span style="color:#e6db74">&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> SELECT *, sum(column_2) OVER (ORDER by column_1) AS cumulative_sum
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> FROM read_csv(&amp;#39;my_table.csv&amp;#39;);&amp;#34;&lt;/span> |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> gnuplot -e &lt;span style="color:#e6db74">&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> set terminal dumb;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> set datafile separator &amp;#39;,&amp;#39;;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> set style data histograms;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> set style fill solid 1.00 border -1;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> set xlabel &amp;#39;Column 1&amp;#39;;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> set ylabel &amp;#39;CSum&amp;#39;;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> set title &amp;#39;Cumulative Sum of values&amp;#39;;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> plot &amp;#39;-&amp;#39; using 3:xtic(1);&amp;#34;&lt;/span> |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tr -d &lt;span style="color:#e6db74">&amp;#39;\014&amp;#39;&lt;/span> &lt;span style="color:#75715e"># Remove a pesky ^L at the top&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Cumulative Sum of values
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 500 +-----------------------------------------------------------------+
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> | + + + + + + + + + ++ |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 450 |-+ &amp;#39;-&amp;#39; using 3:xtic+-+ +-||--+-|
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 400 |-+ |#| || +-|
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> | |#| || |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 350 |-+ ++ |#| || +-|
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> | || |#| || |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 300 |-+ +-+ || |#| || +-|
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 250 |-+ ++ |#| || |#| || +-|
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>CSum | +-+ || |#| || |#| || |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 200 |-+ |#| || |#| || |#| || +-|
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> | |#| || |#| || |#| || |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 150 |-+ ++ |#| || |#| || |#| || +-|
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> | || |#| || |#| || |#| || |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 100 |-+ +-+ || |#| || |#| || |#| || +-|
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 50 |-+ ++ |#| || |#| || |#| || |#| || +-|
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> | +-+ || |#| || |#| || |#| || |#| || |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 0 +-----------------------------------------------------------------+
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> 1 2 3 4 5 6 7 8 9 10
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Column 1
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>We get a cute ascii-like plot! That is a bit too long of a &amp;ldquo;one-liner&amp;rdquo;, I&amp;rsquo;ll go through the commands:&lt;/p>
&lt;ul>
&lt;li>Run a &lt;code>duckdb&lt;/code> command (&lt;code>-c&lt;/code>) that reads the previously-saved table. The &lt;code>-csv&lt;/code> flag at the starts converts the output to csv.&lt;/li>
&lt;li>Run gnuplot with certain specifications:
&lt;ul>
&lt;li>The flag &lt;code>-e&lt;/code> Allows to pass a series of commands without an interactive session.&lt;/li>
&lt;li>&lt;code>set terminal dumb&lt;/code>: it will send as plain text to stdout.&lt;/li>
&lt;li>&lt;code>set datafiler separator &amp;quot;,&amp;quot;&lt;/code>: The input is a CSV file.&lt;/li>
&lt;li>&lt;code>set style data histograms&lt;/code>: Changes the plotting style into a barplot.&lt;/li>
&lt;li>&lt;code>set style fill solid ...&lt;/code>: Visual adjustments to the bars for clarity.&lt;/li>
&lt;li>&lt;code>set xlabel ...&lt;/code> Adds the axis labels. Similar for &lt;code>ylabel&lt;/code> and &lt;code>title&lt;/code>.&lt;/li>
&lt;li>&lt;code>plot '-' using 3:xtic(1)&lt;/code>: Use stdin data to Plot the columns 3 on the y axis (&lt;code>cumulative_sum&lt;/code>) and the first column in the x-axis (&lt;code>column_1&lt;/code>).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Lastly, use the &lt;code>tr&lt;/code> command line tool to remove a &lt;code>^L&lt;/code> That appeared at the start of the output and was bothering me too much.&lt;/li>
&lt;/ul>
&lt;p>While there are a many other ways to wrangle tables such as via pandas or polars in Python, I find duckdb to be a powerful tool for exploratory analyses and data wrangling (often from within Python). It is flexible enough to be used by itself, via bindings in another language, or directly on the command line. Lastly, I showed that when used as a Command Line Interface (CLI) duckdb synergises with other tools for data visualisation from the comfort(?) of the terminal.&lt;/p></description></item>/<item><title>Run multiple python scripts in the background</title><link>https://quasimorphic.com/archive/screen-batch-model-deployment/</link><pubDate>Tue, 26 Aug 2025 14:29:00 -0400</pubDate><guid>https://quasimorphic.com/archive/screen-batch-model-deployment/</guid><description>&lt;p>To solve a multitude of challenges I have faced when processing high throughput microscopy data, have developed &lt;a href="https://github.com/afermg/nahual">Nahual&lt;/a>, a tool that allows me to move data across multiple Python environments that deploy deep learning models in the background. I usually keep these models &amp;ldquo;listening&amp;rdquo; in the background for the main analysis pipeline (&lt;a href="https://github.com/afermg/aliby">aliby&lt;/a>) to send them data to process. To be able to monitor what&amp;rsquo;s going on inside of these scripts I use &lt;a href="https://www.gnu.org/software/screen/">GNU screen&lt;/a>, which allows me to detach and reattach into these sessions whenever I need to. At some point I had to reboot my server and had rerun all these in independent screens. This rudimentary shell script did the job:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-shell" data-lang="shell">&lt;span style="display:flex;">&lt;span>cd cellpose
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>screen -d -S cellpose1 -m bash -c &lt;span style="color:#e6db74">&amp;#39;nix develop . --command bash -c &amp;#34;python server.py ipc:///tmp/cellpose1.ipc&amp;#34;&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>screen -d -S cellpose2 -m bash -c &lt;span style="color:#e6db74">&amp;#39;nix develop . --command bash -c &amp;#34;python server.py ipc:///tmp/cellpose2.ipc&amp;#34;&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cd ../trackastra
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>screen -d -S trackastra -m bash -c &lt;span style="color:#e6db74">&amp;#39;nix develop . --command bash -c &amp;#34;python server.py ipc:///tmp/trackastra.ipc&amp;#34;&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cd ..
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Basically the screen runs my Nix environment and deploys the model (in this case, my &lt;a href="https://github.com/afermg/nahual">fork&lt;/a> of cellpose with Nix dependency management) while detached. This executes a &lt;code>server.py&lt;/code> file within the Nix enviroment, it runs on a loop waiting to receive data and process it. Automatically deploying to multiple screens reduces the annoyance of having to the usual steps of (go to the folder -&amp;gt; run screen -&amp;gt; Nix environment -&amp;gt; run Python server -&amp;gt; Detach screen session). I just add more models if I want further deployments, put it in a bash script and call it a day.&lt;/p>
&lt;p>To access any of these screens for inspection I just use the name indicated after the &lt;code>-S&lt;/code> flag (e.g., &lt;code>screen -r cellpose1&lt;/code>). This way I can check if any issue crops up in the main analysis script or pipeline.&lt;/p></description></item>/<item><title>Simple progress indicators with awk</title><link>https://quasimorphic.com/archive/awk-simple-progress-indicator/</link><pubDate>Tue, 19 Aug 2025 18:58:00 -0400</pubDate><guid>https://quasimorphic.com/archive/awk-simple-progress-indicator/</guid><description>&lt;p>I wanted a simple way to see the progress of a data processing pipeline, and the internal progress bar tools were messed up by threading. I thus decided to use the number of output files in each folder as an indicator of progress. In my case the output of &lt;code>tree .&lt;/code> looks like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>.
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>└── steps
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ├── A01_001
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   ├── segment_nuclei
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── 0000.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── 0001.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── ...
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   └── 0019.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   ├── tile
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── 0000.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── 0001.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── ...
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>I can get the info I need by counting the total number of files and the occurrences of the &lt;code>A01_001&lt;/code> -&amp;gt; &lt;code>P24_005&lt;/code> range (these are fields of view from a microscopy experiment). Using this simple &lt;code>find&lt;/code> command we get all the files in the current folder.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>find . -type f
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>which results in this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>./steps:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>./steps/A01_003/tile/0007.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>./steps/A01_003/tile/0009.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>./steps/A01_003/tile/0018.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>./steps/A01_003/tile/0016.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>...
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>We could use &lt;code>wc -l&lt;/code> to get the number of files per directory, we want a bunch of progress bars to get a better sense of change over time. For this I use &lt;code>awk&lt;/code>, my swiss-army knife for text processing, and I write a short script that counts, &lt;a href="https://stackoverflow.com/a/2458455">sorts&lt;/a> and &lt;a href="https://stackoverflow.com/a/68371463">prints&lt;/a> the number of occurrences as a number of dots. I also added a conditional to only track after more than one file has been produced, for pipelines that produce save one file before actually running the whole pipeline.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-awk" data-lang="awk">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># progress_bar.awk&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> (&lt;span style="color:#66d9ef">match&lt;/span>(&lt;span style="color:#f92672">$&lt;/span>&lt;span style="color:#ae81ff">0&lt;/span>,&lt;span style="color:#e6db74">&amp;#34;([A-P][0-9]{2}_[0-9]{3})&amp;#34;&lt;/span>, &lt;span style="color:#a6e22e">capture&lt;/span>)){
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">count&lt;/span>[&lt;span style="color:#a6e22e">capture&lt;/span>[&lt;span style="color:#ae81ff">1&lt;/span>]] &lt;span style="color:#f92672">+=&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>END{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">n&lt;/span>&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">asorti&lt;/span>(&lt;span style="color:#a6e22e">count&lt;/span>, &lt;span style="color:#a6e22e">sorted&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">for&lt;/span> (&lt;span style="color:#a6e22e">i&lt;/span>&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span>; &lt;span style="color:#a6e22e">i&lt;/span>&lt;span style="color:#f92672">&amp;lt;=&lt;/span>&lt;span style="color:#a6e22e">n&lt;/span>; &lt;span style="color:#a6e22e">i&lt;/span>&lt;span style="color:#f92672">++&lt;/span>){
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">nfiles&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#a6e22e">count&lt;/span>[&lt;span style="color:#a6e22e">sorted&lt;/span>[&lt;span style="color:#a6e22e">i&lt;/span>]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> (&lt;span style="color:#a6e22e">nfiles&lt;/span> &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>){
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">s&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">sprintf&lt;/span>(&lt;span style="color:#a6e22e">key&lt;/span> &lt;span style="color:#e6db74">&amp;#34;%*s&amp;#34;&lt;/span>, &lt;span style="color:#a6e22e">nfiles&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;&amp;#34;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">gsub&lt;/span>(&lt;span style="color:#e6db74">&amp;#34;.&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;.&amp;#34;&lt;/span>, &lt;span style="color:#a6e22e">s&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">print&lt;/span> &lt;span style="color:#a6e22e">sorted&lt;/span>[&lt;span style="color:#a6e22e">i&lt;/span>] &lt;span style="color:#e6db74">&amp;#34; &amp;#34;&lt;/span> &lt;span style="color:#a6e22e">s&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Running the &lt;code>find&lt;/code> command and the &lt;code>awk&lt;/code> script (&lt;code>find . -type f | awk -f progress_bar.awk&lt;/code>) yields the following snapshot of the processing progess&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>A01_001 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A01_002 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A01_003 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A01_004 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A01_005 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A02_001 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A02_002 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A02_003 .................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A02_004 ..........................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A02_005 ........................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A03_001 ..............................................
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Thus the last thing to do is to use `watch` to automatically refresh the status:&lt;/p>
&lt;p>&lt;code>watch -dc --interval 1 'find . -type f | awk -f progress_bar.awk | tac'&lt;/code>&lt;/p>
&lt;p>The &lt;code>watch&lt;/code> flag &lt;code>-d&lt;/code> highlight the changes over time and &lt;code>-c&lt;/code> enables intrepreting ANSI colours, in my terminal this makes the changes last stay longer, but YMMV. Finally, &lt;code>tac&lt;/code> makes sure that the last lines are displayed at the top. I like to run this command somewhere in another terminal or in a `screen` terminal multiplexer. When the number of rows becomes too high it may be useful find a heuristic to remove uninformative lines.&lt;/p></description></item>/<item><title>Update figure numbering</title><link>https://quasimorphic.com/archive/awk-update-figure-numbering/</link><pubDate>Thu, 14 Aug 2025 15:42:00 -0400</pubDate><guid>https://quasimorphic.com/archive/awk-update-figure-numbering/</guid><description>&lt;p>I was editing some markdown and had to insert a new figure in the middle. The problem is that this document already has an explicit figure numbering (e.g., &amp;ldquo;Figure 5&amp;rdquo;), so changing tens of figures felt dull. I like to run small (GNU) &lt;code>awk&lt;/code> scripts for this type of tasks.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-awk" data-lang="awk">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># update_figures.awk&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> (&lt;span style="color:#66d9ef">match&lt;/span>(&lt;span style="color:#f92672">$&lt;/span>&lt;span style="color:#ae81ff">0&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;Figure ([0-9]+)&amp;#34;&lt;/span>, &lt;span style="color:#a6e22e">num&lt;/span>)){
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> (&lt;span style="color:#a6e22e">num&lt;/span>[&lt;span style="color:#ae81ff">1&lt;/span>] &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#a6e22e">after&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">gsub&lt;/span>(&lt;span style="color:#e6db74">&amp;#34;Figure ([0-9]+)&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;Figure &amp;#34;&lt;/span> &lt;span style="color:#a6e22e">num&lt;/span>[&lt;span style="color:#ae81ff">1&lt;/span>] &lt;span style="color:#f92672">+&lt;/span> &lt;span style="color:#a6e22e">increase_by&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> };
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">print&lt;/span> &lt;span style="color:#f92672">$&lt;/span>&lt;span style="color:#ae81ff">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This changes Figure &lt;code>X&lt;/code> into Figure &lt;code>X&lt;/code> + &lt;code>increase_by&lt;/code> starting after the variable &amp;ldquo;after&amp;rdquo;. And we can run it as follows:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-shell" data-lang="shell">&lt;span style="display:flex;">&lt;span>awk -v after&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">4&lt;/span> -v increase_by&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span> -f update_figures.awk input_file.md
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>To edit the file in-place add the &lt;code>-i&lt;/code> flag.&lt;/p></description></item>/</channel></rss>