Author Archives: paulborile

About paulborile

I’m a multi-skilled IT professional with a good all-round supervisory and technical expertise. Extensive, 20+ years of professional experience in software development allowed me to investigate computer science and software engineering inside out. During these years I built up a solid base of design patterns, software architectures and programming languages such as C/C++, Golang, Java, Python, SQL, Assembly (and many others). I worked on mission-critical and multi-channel applications, applying distributed computing, messaging, image/data processing and computer graphics techniques. I faced both architecture design and systems rearchitecting, microservices introduction and technology migration as well as company wide adoption of new technologies/methodologies multiple times. As an entrepreneur I have built and grown teams and development organizations from the ground up (internal/out sourced/at customer site) focusing on software engineering methodologies as well as recruiting, budget/financial control and operations support. I am particularly interested in software testing methodologies, software quality metrics and tools to make software development faster and better. Currently leading the Italian development team for ScientiaMobile Inc, a Reston (US) based startup focused on image optimizing CDN and mobile detection technologies and services. Born in Dearborn Michigan and living in Italy since many years now I speak fluently both English and Italian, studied French and learned some Russian while working for some time for a Olivetti/Aeroflot project.

The Pragmatic Programmer

I think this book is full of valuable thoughts that I would like to recap in this post :

A broken window.
One broken window, left unrepaired for any substantial length of time, instills in the inhabitants of the building a sense of abandonment—a sense that the powers that be don’t care about the building. So another window gets broken. People start littering. Graffiti appears. Serious structural damage begins. In a relatively short space of time, the building becomes damaged beyond the owner’s desire to fix it, and the sense of abandonment becomes reality.

How often this applies to software : you can have the best design guidelines but leaving a broken windows (bad design, wrong decisions, poor code) will slowly propagate that error to all the new code written.

Know when to stop

In some ways, programming is like painting. You start with a blank canvas and certain basic raw materials. You use a combination of science, art, and craft to determine what to do with them. You sketch out an overall shape, paint the underlying environment, then fill in the details. You constantly step back with a critical eye to view what you’ve done. Every now and then you’ll throw a canvas away and start again.
But artists will tell you that all the hard work is ruined if you don’t know when to stop. If you add layer upon layer, detail over detail, the painting becomes lost in the paint.

I read this as don’t over engineer : let your code do the jobs for some time, don’t over refine.

Dry (Don’t Repeat Yourself)

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

We all know this right ? But it is not a matter od duplicating code : it is about duplicating knowledge.

Orthogonality

In computing, the term has come to signify a kind of independence or decoupling. Two or more things are orthogonal if changes in one do not affect any of the others. In a well-designed system, the database code will be orthogonal to the user interface: you can change the interface without affecting the database, and swap databases without changing the interface.

You are familiar with orthgonality ( modular, component-based, and layered are synonyms). I read this as : think at your module/component as a service that exposes an API to users :

  • efficient development (no one is waiting for now one else for stuff to be done)
  • easy to test : orthogonal systems can be tested independently
  • easy to understand how to use

To be continued!

Upload filters aka The Censorship machine

Since end of 2016 the European Parliament has filed a proposal for a directive in the area of digital markets and copyrights. As part of this proposal the Article 13 introduces a new concept :

Internet platforms hosting “large amounts” of user-uploaded content must monitor user behavior and filter their contributions to identify and prevent copyright infringement.

As you may imagine this changes the game pretty much.

Let’s make an example : a rightholder of music rights may ask platforms like www.soundcloud.com (Germany) to keep a look over a set of their works. Soundcloud will have to start monitoring all uploads to make sure that those materials are not uploaded by anyone on their platform.

Impact of this regulation, if it is going to pass, will be pretty strong on the EU contries economy. Let’s try to put down some points :

  1. Putting all the control burden on internet platforms hosting contents will probably result in :
    • being much more difficult for EU companies to compete with US/Asia content providers
    • get-away from EU countries for all new startups and existing companies in order to not have to comply with regulation

  2. Filter technology is too vast and complicated to be approached by each and single content provider : hundreds of rightholders requiring control over multiple sets of data ( text, images, audio, video, music score, software code ) will generate the need of content check providers that will de facto have censorship power .

  3. Guilty until proven innocent paradigma : if a filter erroneously blocks legal content it will be up to the content owner fight to make his content reinstated
  4. False positives : as in all automated checking procedures the number of false positives could be extremely high resulting in a limitation of freedom of expression

Many campaigns around this can be found :

  • Save Your Internet“Stand up and ask Europe to protect Your Internet” (offers contact-your-MEP tool)
  • Say No to Online Censorship by the Civil Liberties Union for Europe: “Act now! It’s about our freedom to speak. It’s about censorship.” (offers email-your-MEP tool)
  • #SaveTheMeme,referring to parodies and other expressions of web culture that may be removed by such filtering technology
  • Create•Refresh“These changes put the power of small, independent creators in jeopardy. Creative expression will effectively be censored, leaving only the bigger, more established players protected. Many of the sites that we use every day for information or entertainment may cease to exist.”
  • Save Codeshare

Thanks to Julia Reda (Pirate Party, EU Parliament member) for a lot of information on this topic.

Allocating memory inside a Varnish vmod


Writing varnish modules is pretty well documented by the standard varnish documentation, tutorials  and thanks to valuable work from other people here  . There are some areas I felt the need to be further clarified and this post tries to do that.

Allocating memory inside a vmod is tricky if you need to free it when the current Request is destroyed. Here are some ways :

  • per request memory allocation i.e. scope is the request lifetime so memory will be freed when the request is destroyed) :
void WS_Init(struct ws *ws, const char *id, void *space, unsigned len);
unsigned WS_Reserve(struct ws *ws, unsigned bytes);
void WS_MarkOverflow(struct ws *ws);
void WS_Release(struct ws *ws, unsigned bytes);
void WS_ReleaseP(struct ws *ws, char *ptr);
void WS_Assert(const struct ws *ws);
void WS_Reset(struct ws *ws, char *p);
char *WS_Alloc(struct ws *ws, unsigned bytes);
void *WS_Copy(struct ws *ws, const void *str, int len);
char *WS_Snapshot(struct ws *ws);
int WS_Overflowed(const struct ws *ws);
void *WS_Printf(struct ws *ws, const char *fmt, ...) __printflike(2, 3);

This is a per worker thread memory space allocation, no free necessary as data is removed when the request is detroyed. Ex :

VCL_STRING
vmod_hello(const struct vrt_ctx *ctx, VCL_STRING name)
{
   char *p;
   unsigned u, v;

   u = WS_Reserve(ctx->ws, 0); /* Reserve some work space */
   p = ctx->ws->f;         /* Front of workspace area */
   v = snprintf(p, u, "Hello, %s", name);
   v++;
   if (v > u) {
      /* No space, reset and leave */
      WS_Release(ctx->ws, 0);
      return (NULL);
   }
   /* Update work space with what we've used */
   WS_Release(ctx->ws, v);
   return (p);
}

Data is allocated starting with 64k and then when needed in 4k chunks in the cts->ws area. No varnish imposed limit.

  • (since varnish 4.0 up) Private Pointers : a way to have multi-scoped private data per each VCL, TASK. You may access private data either as passed on the VCL function signature or by calling directly VRT_priv_task(ctx, “name”) for example to obtain a per request place to hold :
    • free function
    • pointer to allocated data

This method is very interesting if you need a cleanup function to be called when the varnish request is destroyed.

Webassembly/wasm and asm.js

Photo by Markus Spiske on Unsplash

The web assembly thing. I’ll try to clarify things that I learned working on it:

  1. WASM : short for WebAssembly, a binary instructions format that runs on a stack based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust to be run on the Web. Reference here
  2. asm.js : a subset of js, static typed and highly optimizable, created to allow running  higher level languages like C application on the Web. Reference here and here

So you would say 1 and 2 have the same purpose : AFAIK yes. You can also convert asm.js to wasm and decode wasm back to asm.js (theoretically). Seems that WASM is going to be extended in the future compared to asm.js.

Let’s continue :

  1. emscripten  : toolchain to compile high level languages to asm.js and WASM. Uses LLVM and does also come conversion of API (openGL to WebGL for ex) and compiles to LLVM IR (llvm bitcode) and then from LLVM IR Bitcode to asm.js using Fastcomp.
  2. Binaryen (asm2wasm) : compiles asm.js to wasm and is included in emscripten (?)

Supposing that you have a C/C++ project, made of different libraries, I suggest to compile to LLVM IR Bitcode all the single components and just during the link phase generate asm.js/wasm for execution. This will allow you to maintain your building/linking steps as you would have in an standard object code generation environment.
emscripten/LLVM offer a full set of tools to compile.work on IR Bitcode if you like :

  • emmake : use existing makefiles by running emmake make
  • emconfigure : use existing configure command by running emconfigure configure <options>

Also if you want to dig deeper into llvm :

  • lli : directly executes programs in LLVM bitcode format. It takes a program in LLVM bitcode format and executes it using a just-in-time compiler or an interpreter
  • llc : compiles LLVM source inputs into assembly language for a specified architecture. The assembly language output can then be passed through a native assembler and linker to generate a native executable

Once you have all your compiled libraries/components in LLVM IR Bitcode you have to generate WASM. The basic compile command is :

emcc -s WASM=1 -o <prog>.html <prog>.c -l<anylibraryyouneed>

but :

  1. If you are using malloc/free you need to add : -s ALLOW_MEMORY_GROWTH=1
  2. If you are using pthreads in your code/libraries you need to add : -s USE_PTHREADS=1 but as of at Jan 2019 you can’t have both malloc/free and pthreads. More info here.

More to come soon.

 

golang-console

Profiling a golang REST API server

Profiling :

is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization.

How can you profile your golang REST API server in a super simple way :

First : add some lines to your server code

import _ "net/http/pprof"

And then add a listener (I normally use a command line flag to trigger this) :

go func() {
http.ListenAndServe("localhost:6000", nil)
}()

Start your server and generate some load. While your code is running under the load you generated extract the profiler data :

go tool pprof http://localhost:6000/debug/pprof/profile
Fetching profile over HTTP from http://localhost:6000/debug/pprof/profile
Saved profile in /home/paul/pprof/pprof.wm-server.samples.cpu.008.pb.gz
File: wm-server
Build ID: c806572b51954da99ceb779f6d7eee3600eae0fb
Type: cpu
Time: Dec 19, 2018 at 1:41pm (CET)
Duration: 30.13s, Total samples = 17.35s (57.58%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

You have many commands at this point but what I prefer to do, having used kcachegrind for years, is to fire it up using the kcachegrind command :

(pprof) kcachegrind

This will generate a callgrind formatted file and run kcachegrind on it to let you do all the usual analysis that you’re probably already used to do (call graph, callers, callees ..)