Article: Xml API to Elasticsearch with Foopipes 2017-06-11

Xml API to Elasticsearch with Foopipes

[This article was first published on Medium.]

Listening on a port or polling an api. Transform the data, and then push it to a data store or another service is a common task in a developer’s life. Foopipes is a lightweight tool for easy set up of these kind of tasks without any or very little programming. This is an example of how to poll an XML endpoint and then store the entities to a data storage.


In Sweden the government has a monopoly on selling alcohol. That is inconvenient at many times, but on the flip side they have a pretty large selection of wines. They also claim they have an API with all their products, but it actually it only consists of one large XML feed with all their products in one single document.

This is an example of how to use Foopipes to fetch all the products and put them in a searchable Elasticsearch document store in minutes, as well as exposing a simple free text search API.

What is Foopipes?

Foopipes is a tool to obtain, transform and send data between services. It is a easy way of wiring services together in new interesting ways, but also for setting up event chains in a workflow manner. When this, do this and then do that etc.

It has support for writing custom logic in .NET and Node.js when the built in tasks are not enough.

You can read more about Foopipes at https://foopipes.com/

Get it up and running

Foopipes runs in Docker. So first you’ll need to install Docker on your computer, but that is something you’ve probably done already? If not, do it now and welcome to the world of containers!

Normally a good start is to create a default Foopipes YAML configuration file in current directory. For this example you can skip this part, but just to get you covered:

docker run aretera/foopipes config >foopipes.yml

Fetching and parsing the XML

Copy/paste this config to a file named foopipes.yml in current directory.

pipelines: 
  - 
    when: 
      - { queue: started }
    from:
      - { http: "https://www.systembolaget.se/api/assortment/products/xml", format: xml }
    do:
      - { select: "$.artiklar.artikel[*]" }
    to:
      - { log }
    finally:
      - { exit }

Start Foopipes in Docker it with one of these commands depending on your environment. It mounts current directory and starts the Foopipes docker image. It will automatically look for a file named foopipes.yml and invoke your pipeline.

# bash:
docker run -v $(pwd):/project aretera/foopipes

# Powershell:
docker run -v ${PWD}:/project aretera/foopipes

# Windows cmd:
docker run -v %CD%:/project aretera/foopipes

What this configuration file does is to instruct Foopipes to:

  1. At startup, fetch XML from an url. The XMl is converted to json, as json is the native format in Foopipes. 
  2. Select all articles using a JsonPath expression. Each article will be treated as a separate document in the pipeline from this point.
  3. Output the json to the log which in the normal case is the console window.
  4. Exit the process when all articles are processed.

Store it into Elasticsearch

Log output is not very fascinating, lets move on and send the output to Elasticsearch instead. If you don’t have a Elasticsearch up and running it is very easy to start one up using Docker as well. 

docker run --network host -p 9200 elasticsearch

We also need to expand our configuration a little. Replace the contents in foopipes.yml with this:

plugins: 
  - Elasticsearch

services: 
  elastic:
    type: elasticsearch
    url: "http://<your_ip>:9200"

pipelines: 
  - 
    when: 
      - { queue: started }
    from:
      - { http: "https://www.systembolaget.se/api/assortment/products/xml", format: xml }
    do:
      - { select: "$.artiklar.artikel[*]" }
    to:
      - { store: elastic, index: systembolaget, dataType: artikel, key: "#{Artikelid}" }
    finally:
      - { exit }

This config is very similar, with the key differences that it first loads the Elasticsearch plugin and configure up a service named elastic. Then instead of sending the result to the log it stores it to the document store. As document key it uses a data binding expression #{Artikelid} which will evaluate to the article’s id in the json data.

Start Foopipes and wait for the documents to be loaded into the database. 

Exposing a free text search API

Foopipes can also listen and react on requests to an endpoint. We add another service, in this case an httplistener with name api as well as another pipeline that will handle the requests.

services: 
  api:
    type: httplistener
    path: search/{searchstring}
    responseBody: json

pipelines: 
  - 
    when: 
      - { api }
    from:
      - { search: elastic, index: systembolaget, dataType: artikel, value: "#{route.searchstring}" }
    do:
      - { select: "$.hits[*]" }
      - { select, DisplayName: "#{Namn}, #{Namn2}", Id: "#{Artikelid}" }

When a HTTP request to http://localhost:5000/search/Barolo is received, Elasticsearch will perform a free text search and for each hit a condensed version of the product records are sent as a response.

[
  {
    "DisplayName": "Barolo, Silvio Grasso",
    "Id": "1025919"
  },
  {
    "DisplayName": "Barolo, Colle Belvedere",
    "Id": "1041459"
  },
  {
    "DisplayName": "Barolo La Tartufaia, Giulia Negri",
    "Id": "1018977"
  },
  ...
]

There you have it! No programming.

I should send this article to Systembolaget and tell them they can enhance their API for a fraction of the cost they normally spend on IT systems.

The complete configuration used in this example can be found here.