<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>awk on Alán's blog</title><link>https://quasimorphic.com/tags/awk/</link><description>Recent content in awk on Alán's blog</description><generator>Hugo</generator><language>en-uk</language><lastBuildDate>Tue, 19 Aug 2025 18:58:00 -0400</lastBuildDate><atom:link href="https://quasimorphic.com/tags/awk/index.xml" rel="self" type="application/rss+xml"/><item><title>Simple progress indicators with awk</title><link>https://quasimorphic.com/archive/awk-simple-progress-indicator/</link><pubDate>Tue, 19 Aug 2025 18:58:00 -0400</pubDate><guid>https://quasimorphic.com/archive/awk-simple-progress-indicator/</guid><description>&lt;p>I wanted a simple way to see the progress of a data processing pipeline, and the internal progress bar tools were messed up by threading. I thus decided to use the number of output files in each folder as an indicator of progress. In my case the output of &lt;code>tree .&lt;/code> looks like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>.
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>└── steps
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ├── A01_001
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   ├── segment_nuclei
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── 0000.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── 0001.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── ...
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   └── 0019.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   ├── tile
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── 0000.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── 0001.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> │   │   ├── ...
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>I can get the info I need by counting the total number of files and the occurrences of the &lt;code>A01_001&lt;/code> -&amp;gt; &lt;code>P24_005&lt;/code> range (these are fields of view from a microscopy experiment). Using this simple &lt;code>find&lt;/code> command we get all the files in the current folder.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>find . -type f
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>which results in this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>./steps:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>./steps/A01_003/tile/0007.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>./steps/A01_003/tile/0009.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>./steps/A01_003/tile/0018.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>./steps/A01_003/tile/0016.npz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>...
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>We could use &lt;code>wc -l&lt;/code> to get the number of files per directory, we want a bunch of progress bars to get a better sense of change over time. For this I use &lt;code>awk&lt;/code>, my swiss-army knife for text processing, and I write a short script that counts, &lt;a href="https://stackoverflow.com/a/2458455">sorts&lt;/a> and &lt;a href="https://stackoverflow.com/a/68371463">prints&lt;/a> the number of occurrences as a number of dots. I also added a conditional to only track after more than one file has been produced, for pipelines that produce save one file before actually running the whole pipeline.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-awk" data-lang="awk">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># progress_bar.awk&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> (&lt;span style="color:#66d9ef">match&lt;/span>(&lt;span style="color:#f92672">$&lt;/span>&lt;span style="color:#ae81ff">0&lt;/span>,&lt;span style="color:#e6db74">&amp;#34;([A-P][0-9]{2}_[0-9]{3})&amp;#34;&lt;/span>, &lt;span style="color:#a6e22e">capture&lt;/span>)){
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">count&lt;/span>[&lt;span style="color:#a6e22e">capture&lt;/span>[&lt;span style="color:#ae81ff">1&lt;/span>]] &lt;span style="color:#f92672">+=&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>END{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">n&lt;/span>&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">asorti&lt;/span>(&lt;span style="color:#a6e22e">count&lt;/span>, &lt;span style="color:#a6e22e">sorted&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">for&lt;/span> (&lt;span style="color:#a6e22e">i&lt;/span>&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span>; &lt;span style="color:#a6e22e">i&lt;/span>&lt;span style="color:#f92672">&amp;lt;=&lt;/span>&lt;span style="color:#a6e22e">n&lt;/span>; &lt;span style="color:#a6e22e">i&lt;/span>&lt;span style="color:#f92672">++&lt;/span>){
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">nfiles&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#a6e22e">count&lt;/span>[&lt;span style="color:#a6e22e">sorted&lt;/span>[&lt;span style="color:#a6e22e">i&lt;/span>]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> (&lt;span style="color:#a6e22e">nfiles&lt;/span> &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>){
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">s&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">sprintf&lt;/span>(&lt;span style="color:#a6e22e">key&lt;/span> &lt;span style="color:#e6db74">&amp;#34;%*s&amp;#34;&lt;/span>, &lt;span style="color:#a6e22e">nfiles&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;&amp;#34;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">gsub&lt;/span>(&lt;span style="color:#e6db74">&amp;#34;.&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;.&amp;#34;&lt;/span>, &lt;span style="color:#a6e22e">s&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">print&lt;/span> &lt;span style="color:#a6e22e">sorted&lt;/span>[&lt;span style="color:#a6e22e">i&lt;/span>] &lt;span style="color:#e6db74">&amp;#34; &amp;#34;&lt;/span> &lt;span style="color:#a6e22e">s&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Running the &lt;code>find&lt;/code> command and the &lt;code>awk&lt;/code> script (&lt;code>find . -type f | awk -f progress_bar.awk&lt;/code>) yields the following snapshot of the processing progess&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>A01_001 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A01_002 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A01_003 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A01_004 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A01_005 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A02_001 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A02_002 ...............................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A02_003 .................................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A02_004 ..........................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A02_005 ........................................
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>A03_001 ..............................................
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Thus the last thing to do is to use `watch` to automatically refresh the status:&lt;/p>
&lt;p>&lt;code>watch -dc --interval 1 'find . -type f | awk -f progress_bar.awk | tac'&lt;/code>&lt;/p>
&lt;p>The &lt;code>watch&lt;/code> flag &lt;code>-d&lt;/code> highlight the changes over time and &lt;code>-c&lt;/code> enables intrepreting ANSI colours, in my terminal this makes the changes last stay longer, but YMMV. Finally, &lt;code>tac&lt;/code> makes sure that the last lines are displayed at the top. I like to run this command somewhere in another terminal or in a `screen` terminal multiplexer. When the number of rows becomes too high it may be useful find a heuristic to remove uninformative lines.&lt;/p></description></item>/<item><title>Update figure numbering</title><link>https://quasimorphic.com/archive/awk-update-figure-numbering/</link><pubDate>Thu, 14 Aug 2025 15:42:00 -0400</pubDate><guid>https://quasimorphic.com/archive/awk-update-figure-numbering/</guid><description>&lt;p>I was editing some markdown and had to insert a new figure in the middle. The problem is that this document already has an explicit figure numbering (e.g., &amp;ldquo;Figure 5&amp;rdquo;), so changing tens of figures felt dull. I like to run small (GNU) &lt;code>awk&lt;/code> scripts for this type of tasks.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-awk" data-lang="awk">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># update_figures.awk&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> (&lt;span style="color:#66d9ef">match&lt;/span>(&lt;span style="color:#f92672">$&lt;/span>&lt;span style="color:#ae81ff">0&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;Figure ([0-9]+)&amp;#34;&lt;/span>, &lt;span style="color:#a6e22e">num&lt;/span>)){
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> (&lt;span style="color:#a6e22e">num&lt;/span>[&lt;span style="color:#ae81ff">1&lt;/span>] &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#a6e22e">after&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">gsub&lt;/span>(&lt;span style="color:#e6db74">&amp;#34;Figure ([0-9]+)&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;Figure &amp;#34;&lt;/span> &lt;span style="color:#a6e22e">num&lt;/span>[&lt;span style="color:#ae81ff">1&lt;/span>] &lt;span style="color:#f92672">+&lt;/span> &lt;span style="color:#a6e22e">increase_by&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> };
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">print&lt;/span> &lt;span style="color:#f92672">$&lt;/span>&lt;span style="color:#ae81ff">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This changes Figure &lt;code>X&lt;/code> into Figure &lt;code>X&lt;/code> + &lt;code>increase_by&lt;/code> starting after the variable &amp;ldquo;after&amp;rdquo;. And we can run it as follows:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-shell" data-lang="shell">&lt;span style="display:flex;">&lt;span>awk -v after&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">4&lt;/span> -v increase_by&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span> -f update_figures.awk input_file.md
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>To edit the file in-place add the &lt;code>-i&lt;/code> flag.&lt;/p></description></item>/</channel></rss>