Quasimorphic: Alán F. Muñoz's blog

Set up email hosting and a personal website on personal domain

Mon, 24 Nov 2025 19:38:00 -0500

At some point I was struggling to get access to my Gmail account. Since I usually block unwanted scripts from running on my computer, Google likes to flag my Gmail login attempts as suspicious activity. This would be fine if it didn’t also make one of my only alternative identification methods an SMS. If I were to lose my phone or number I could be locked out of my account. Most accounts I have assume I have access to this email, thus a good chunk of modern life requires me accessing it, the prospect of becoming unable to log in seems quite realistic. This post (by an email-hosting company) builds a case against Gmail ground of privacy. It was at last time to get my own domain and control over my email. I also started this blog this year and it is pleasing to give it a nice .com home.

Use dired-do-shell to explore the parquet schema from Emacs

Thu, 23 Oct 2025 15:21:00 -0400

I use dired-do-shell command in Emacs to run CLI commands from within its file manager dired. This workflow makes it easy to perform batch operations on files that would be annoying otherwise. The trouble arose when trying to use the duckdb CLI to print the schema of a parquet file, as the notation for wildcards in emacs (* and ?) conflicts with duckdb’s usage of the former. Thus running the following after M-x dired-do-shell (bound to ! in dired-mode) did not work:

Calculate the cumulative sum of a column using DuckDB

Wed, 22 Oct 2025 20:34:00 -0400

Duckdb, the (tabular) data exploration tool I use supports window operations. I recently discovered that it can also perform cumulative sums in a very efficient manner.

Let us generate a toy dataset where we want to calculate the sum of one column relative to the order of another one.

CREATE OR REPLACE TABLE seed AS SELECT SETSEED(0.1); -- seeding for reproducibility, creating a table to hide output
-- Create a mock dataset with two integer columns
CREATE OR REPLACE TABLE my_table AS
SELECT
#1 AS column_1,
CAST(FLOOR(RANDOM() * 100) AS INT) AS column_2
FROM generate_series(1, 10); -- This generates 10 rows
SELECT * FROM my_table;
-- We write it to a csv for future use
COPY my_table TO my_table.csv;

┌──────────┬──────────┐
│ column_1 │ column_2 │
│ int64 │ int32 │
├──────────┼──────────┤
│ 1 │ 27 │
│ 2 │ 45 │
│ 3 │ 2 │
│ 4 │ 84 │
│ 5 │ 84 │
│ 6 │ 26 │
│ 7 │ 18 │
│ 8 │ 65 │
│ 9 │ 97 │
│ 10 │ 11 │
├──────────┴──────────┤
│ 10 rows 2 columns │
└─────────────────────┘

If we wanted to calculate the distribution of the cumulative sum of the table we could use the OVER clause to perform the sum of column_2 in the order defined by column_1.

Run multiple python scripts in the background

Tue, 26 Aug 2025 14:29:00 -0400

To solve a multitude of challenges I have faced when processing high throughput microscopy data, have developed Nahual, a tool that allows me to move data across multiple Python environments that deploy deep learning models in the background. I usually keep these models “listening” in the background for the main analysis pipeline (aliby) to send them data to process. To be able to monitor what’s going on inside of these scripts I use GNU screen, which allows me to detach and reattach into these sessions whenever I need to. At some point I had to reboot my server and had rerun all these in independent screens. This rudimentary shell script did the job:

Simple progress indicators with awk

Tue, 19 Aug 2025 18:58:00 -0400

I wanted a simple way to see the progress of a data processing pipeline, and the internal progress bar tools were messed up by threading. I thus decided to use the number of output files in each folder as an indicator of progress. In my case the output of tree . looks like this:

.
└── steps
 ├── A01_001
 │   ├── segment_nuclei
 │   │   ├── 0000.npz
 │   │   ├── 0001.npz
 │   │   ├── ...
 │   │   └── 0019.npz
 │   ├── tile
 │   │   ├── 0000.npz
 │   │   ├── 0001.npz
 │   │   ├── ...

I can get the info I need by counting the total number of files and the occurrences of the A01_001 -> P24_005 range (these are fields of view from a microscopy experiment). Using this simple find command we get all the files in the current folder.

Update figure numbering

Thu, 14 Aug 2025 15:42:00 -0400

I was editing some markdown and had to insert a new figure in the middle. The problem is that this document already has an explicit figure numbering (e.g., “Figure 5”), so changing tens of figures felt dull. I like to run small (GNU) awk scripts for this type of tasks.

# update_figures.awk
{
 if (match($0, "Figure ([0-9]+)", num)){
 if (num[1] > after)
 gsub("Figure ([0-9]+)", "Figure " num[1] + increase_by)
 };
 print $0
}

This changes Figure X into Figure X + increase_by starting after the variable “after”. And we can run it as follows:

Recursive search and replace

Tue, 12 Aug 2025 13:07:00 -0400

I needed to rename all occurrences of a pattern with another, where I knew there was no ambiguous situations. This uses ripgrep, xargs and GNU sed. source.

rg old_pattern --files-with-matches | xargs sed -i 's/old_pattern/new_pattern/g'

A workflow for bioimaging and data exploration

Wed, 30 Jul 2025 13:05:00 -0400

One of the common challenges when analysing large bioimaging datasets is to bring it all together in one place. I usually use tools like DuckDB for database querying and copairs for selecting statistically significant subsets of the data. For one of my recent projects I built a marimo interface to explore the result of large-scale (~2TB images, ~2GB feature profiles) image-based profiles, then performs dimensionality reduction of the data, and finally retrieves back the images. This I think is the ideal workflow, one where you can be nimble and pull up the images alongside statistical analyses to be able to interpret the data structure in the biological context. The code is not yet available to the public, but you can find the demo here.

Github code review on existing code base

Tue, 26 Nov 2024 13:06:00 -0500

Create an empty branch with one empty commit

Create new branch git checkout --orphan review-1-target
Reset git reset .
Clean branch git clean -df
Add empty commit git commit --allow-empty -m 'Empty commit'

Rebase a branch to put this commit at the root

Push to your fork git push -u origin review-1-target
Move to branch to review git checkout origin/main
Spin-off branch from here git checkout -b review-1
Rebase to empty branch git rebase -i review-1-target, the empty commit must be at the start
Push git push -u origin review-1

That should make a pull request possible, providing the code review tooling. source

Mon, 01 Jan 0001 00:00:00 +0000

About me

I am a computational biologist with software engineering chops working at the interface of ML/DL and biology. I develop computational tools for high throughput biological data and then use them to further my understanding of cell behaviour. I am particularly interested in interpretable AI for mechanistic understanding of biology and its applications on drug discovery, though I also enjoy learning about a wide array of topics in math and the computational/natural sciences.