Emir's blog
https://emiruz.com/
Recent content on Emir's blogHugo -- gohugo.ioen-gbWed, 28 Dec 2022 00:00:00 +0000SQL + M4 = Composable SQL
https://emiruz.com/post/2022-12-28-composable-sql/
Wed, 28 Dec 2022 00:00:00 +0000https://emiruz.com/post/2022-12-28-composable-sql/Introduction I often work with client who have large “data lakes” or big star schema style enterprise databases with fact and dimension tables as far as the eye can see. Invariably said clients end up with a substantial SQL codebase composed of hundreds of independent queries with lots of overlap between them. I want to be able to treat SQL repositories like I’d treat other code bases. That is, I’d like to create libraries, share code, test blocks independently, and so on.A beautiful embedding applied to defect detection
https://emiruz.com/post/2022-11-16-defect-detection/
Wed, 16 Nov 2022 00:00:00 +0000https://emiruz.com/post/2022-11-16-defect-detection/Introduction “Data science” has a handful of fundamental metaphors for problem solving, few moreso versatile than the “point cloud”. That is, translate your data into points in a n-dimensional metric space and then do linear algebra to it. The point cloud metaphor applies most simply to numeric tabular data, but with a little creativity it readily extends to text, images, time-series and so on. In this post I’m going to tackle the KolektorSDD2 image dataset – a collection of normal and defective surfaces for some unnamed product – by using the point cloud metaphor to create a simple custom embedding and then exploiting it with elementary regression methods.A fixed effect UK house price imputation model
https://emiruz.com/post/2022-05-21-uk-houses/
Sat, 21 May 2022 00:00:00 +0000https://emiruz.com/post/2022-05-21-uk-houses/SUMMARY I show how assumptions about price structure can be used to build a compelling fixed effect (deterministic) price imputation model for the UK residential housing market. The model uses just public price paid data. I describe how the data is collected and processed, how the model is designed, and how it is fitted using the Jax Python package. I showcase some results, I discuss shortcomings and I highlight further necessary work prior to use for decision making under uncertainty.Fast thinking on lichess.org
https://emiruz.com/post/2022-04-15-lichess1/
Fri, 15 Apr 2022 00:00:00 +0000https://emiruz.com/post/2022-04-15-lichess1/SUMMARY I use lichess.org games data to investigate the extent to which fast thinking is the dominant factor affecting game outcomes at any time control. I show how to (1) frame a pseudo-experiment, (2) database lichess.org data, and (3) carry out the analysis. I argue that fast thinking is most prominent in quick games. I analyse a sample containing games from pairs of users who have played each other at multiple time controls and show that win probabilities established using 180 sec Blitz games are heavily discounted in 600 sec Rapid games.The Moore-Penrose inverse
https://emiruz.com/post/2022-02-01-moore-penrose-inverse/
Tue, 01 Feb 2022 00:00:00 +0000https://emiruz.com/post/2022-02-01-moore-penrose-inverse/Linear algebra is a fairly known quantity. It has a rich theory, there are lots of useful results, much of the key stuff is not so hard to understand, there are lots of geometric analogies, and so on. If you can get your data into a matrix format, you’re off to the races with just the linear algebra toolbox. All the lovely things I use daily are mostly linear algebra – PCA, SVD, factor analysis, LDA, GLMs, GAMs, SVMs, etc – but I forget the fundamental, elementary stuff.Hello and goodbye to the J language
https://emiruz.com/post/2021-07-02-j/
Fri, 02 Jul 2021 00:00:00 +0000https://emiruz.com/post/2021-07-02-j/I spent about 50 hours making things with a language called J. Its an APL progeny and it promises to make possible the expression of general programming tasks as if in mathematical notation. In J, arrays are first class citizens, and most functions natively support array operations. It also has fancy composition rules, so rather than the usual f(a,b), in J you have either f a or a f b.Some less usual IQ scepticism
https://emiruz.com/post/2020-12-01-iq-rabbit-hole/
Tue, 01 Dec 2020 00:00:00 +0000https://emiruz.com/post/2020-12-01-iq-rabbit-hole/INTRODUCTION The crux with IQ – so far as I understand it – is that performance across abstract reasoning tasks is correlated no matter what the tasks are. That is, being good at one type of abstract puzzle implies that you’re more likely to be good at any other such puzzle. If you got \(M\) people to do \(N\) puzzles and made an \(M\times N\) matrix of their scores, and then did some SVD on it, the first singular value would account for much of the weight, and if you were to rank the rows (participants) by their weight in the first left singular vector, you’d have created something akin to an IQ score.About me
https://emiruz.com/about-me/
Mon, 01 Jan 0001 00:00:00 +0000https://emiruz.com/about-me/My name is Emir. I do research commercially, mostly by applying maths, stats and comp. sci. I’ve been at it about 6 years. I’m also a software engineer of 16+ years and an astronomer (PhD candidate). Aside from the PhD in progress, my academic background is in Analytical Philosophy (BA) and Applied Maths (GDip, Msc). My Linkedin is here and my Github dumpster fire is here. You can contact me by email.