nginx – remove “no live upstreams” for your backup connections

In this article, we discuss the use of Nginx upstream module for HTTP and CGI (FastCGI) requests.

Using Nginx upstream module is essential for scaling application backend, but there might be a few catch-ups. One of them is related to what will happen when a server fails?
There is a proxy_next_upstream directive (or for FastCGI – fastcgi_next_upstream), which instructs the upstream module what is a fail – is it a connection error or timeout or HTTP 500 returned by the upstream server or an ordinary HTTP 404 returned by the upstream server is also a fail. So when a failure is identified by the Nginx upstream module the upstream module will look for the next upstream server to handle the request. These directives instruct Nginx upstream module what is a failure then to handle the next upstream server if available.
The default values are too conservative (and probably it is better to be like that):

proxy_next_upstream error timeout;

And available options are:

proxy_next_upstream error | timeout | invalid_header | http_500 | http_502 | http_503 | http_504 | http_403 | http_404 | http_429 | non_idempotent | off ...;

They are the same for fastcgi_next_upstream, too.

Imagine you what to protect yourself from HTTP 500? Or even HTTP 403 and 404? It’s normal to include them in your configuration. But here is the catch-up:
What if an image is really missing (HTTP 404) on all your upstream backend servers? Or there is a syntax error in an application file (like a simple PHP file)? All your upstreams servers will return 404 or 500 (in case of application error) and all of them will be blacklisted for at least 20 seconds. Remember 404 or 500 is a failure and we need next upstream server and if all return failure for this particular request Nginx will return to the client there is a problem and will mark the server as down (unavailable for a period of time).

SO because of a single file (a problem in a single file), all the following requests will be denied with “502 Bad Gateway” or “500 Internal Server Error”, even your servers are healthy!

Just a tiny miss like a missing image could be misinterpreted as your upstream servers have problems, so they must be blocked! Even if you put a “backup” directive in the upstream server line!

The solution is to include one or more of your upstream servers with disabled failure count (fail_timeout=0s) as a backup server. So this server will be always available when all normal servers got blacklisted! And you are not going to receive any more “no live upstreams” and returning an error to your clients.
Here is a working configuration (it is the same for HTTP and FastCGI setups):

upstream backend {
     server   127.0.0.1:8000;
     server   10.10.10.10:9000 fail_timeout=20s;
     server   127.0.0.1:8000 backup fail_timeout=0s max_fails=1000;
}

And be careful what you add in proxy_next_upstream (or fastcgi_next_upstream). In general, HTTP 403 and 403 are not for this directive!

A real-world example (FastCGI)

One of our project using an Nginx with the upstream module to scale the PHP application backend began to serve only HTTP 502 to all clients! In the PHP logs, there was a rear syntax error on a single file (not in the main part of the site), but Nginx was answering to all requests, no matter of the URI with 502. What had happened? The two backend application servers had returned with 500 (because of this error) and were blacklisted for 20 seconds! And all the following requests were not served by the upstream backends because there were “no live upstreams”:

upstream backend-php {
        server   127.0.0.1:8000;
        server   178.63.22.46:9000 fail_timeout=20s;
        server   127.0.0.1:8000 backup;
}

with

fastcgi_next_upstream error timeout invalid_header http_500 http_503 http_429 non_idempotent;

Even we have a backup, it was also blacklisted. In fact, a scanner was scanning all of our PHP files and one URL returned syntax error, then all of the upstream servers were blacklisted and we experienced an effective DoS because of misconfiguration. After we changed our configuration to:

upstream backend-php {
        server   127.0.0.1:8000;
        server   10.10.10.10:9000 fail_timeout=20s;
        server   127.0.0.1:8000 backup fail_timeout=0s max_fails=1000;
}

Everything returned to normal. There was still this syntax error but did not stop all other valid URLs to be served. Of course, we fixed the broken file and stopped the scammer from scanning our site.

A real-world example 2 (HTTP)

In our proxy static cache servers in remote locations, we experienced periodically “no live upstreams” and our clients received “502 Bad Gateway” on-peak hours! The problem was we have too aggressive proxy connect, read and send timeout, but because we were serving a live TV we needed them. And on-peak if a single connection just huck-up for 5-10 seconds our upstream servers were blacklisted for 20 seconds! Using proxy_cache_lock could worsen the situation! Then we changed our configuration to have a backup upstream server, which effectively would not be blacklisted and lowered the proxy_cache_lock to be sure if a single connection failed for some reason all other might succeed in bringing the file to the cache!

proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504 http_403 http_404;
proxy_connect_timeout 2s;
proxy_read_timeout 5s;
proxy_send_timeout 5s;
proxy_cache_lock on;
proxy_cache_lock_timeout 20s;
proxy_cache_lock_age 10s;

with upstream configuration:

upstream backend_http {
        server 10.10.10.10 fail_timeout=20s;
        server 10.10.10.11 backup fail_timeout=0s max_fails=1000;
        keepalive 16;
}

List Openstack container’s options with the swift command-line client – capabilities command

First, you need to install

swift command-line utility

and second, install the command-line tool to manage your account: Install OpenStack swift client only
With the capabilities command you may discover the following important policy and limits of your account like:

  • Listing limits – how many files (objects) will be in the output when using list command.
  • The maximum file size, which is supported by the server.
  • Maximum files (objects) for deletion per a single request. How many files you can delete with a single request, which is very convinient to put thousands of files per one requests, not to initiate an http(s) connection for each file (object), which could be thousands of files, or even millions!
  • Additinal plugins (in terms of Openstack – middleware), which are supported
  • Maximum container name length

and many more.

In general, you will need:

  1. username (–os-username) – Username
  2. password (–os-password) – Password
  3. authentication url (–os-auth-url) – The URL address, which authorize your requests, it generates a security token for your operations. Always use https!
  4. tenant name (–os-tenant-name) – Tenant is like a project.

All of the above information should be available from your OpenStack administrator.
Here an example output of the capabalities command:

myuser@myserver:~$ swift --os-username myusr --os-tenant-name myusr --os-password mypass --os-auth-url https://auth01.example.com:5000/v2.0 capabilities
Core: swift
 Options:
  account_autocreate: True
  account_listing_limit: 20000
  allow_account_management: False
  container_listing_limit: 20000
  extra_header_count: 0
  max_account_name_length: 256
  max_container_name_length: 256
  max_file_size: 5368709122
  max_header_size: 8192
  max_meta_count: 90
  max_meta_name_length: 128
  max_meta_overall_size: 4096
  max_meta_value_length: 256
  max_object_name_length: 1024
  policies: [{'name': 'Policy-0', 'default': True}]
  strict_cors_mode: True
Additional middleware: bulk_delete
 Options:
  max_deletes_per_request: 20000
Additional middleware: bulk_upload
 Options:
  max_containers_per_extraction: 20000
  max_failed_extractions: 1000
Additional middleware: container_sync
 Options:
  realms: {}
Additional middleware: crossdomain
Additional middleware: formpost
Additional middleware: keystoneauth
Additional middleware: slo
 Options:
  max_manifest_segments: 1000
  max_manifest_size: 2097152
  min_segment_size: 1048576
Additional middleware: staticweb

You can see various middleware are activated with specific options – bulk_upload – to upload multiple files with one request (a list with files) and bulk_delete – to delete multiple files per one request and so on.

ansible – using ansible vault with copy module to decrypt on-the-fly files

Here is an interesting tip for all who what to protect the sensitive information with ansible. Our example is simple enough – we want to protect our private key and we want to decrypt it when installing on the server. The copy ansible module has a decrypt feature and it can decrypt the file on-the-fly when the task is executed.
Here is how to use ansible vault to encrypt the file with the private key and the ansible playbook file to copy the file.

If you are a newbie in ansible you can check this article – First ansible use – install and execute a single command or multiple tasks in a playbook There you can see how to create your inventory file (and configure sudo if you remotely log in with unprivileged user) used herein the example.

STEP 1) Encrypt the file with ansible vault

myuser@srv ~ $ ansible-vault encrypt server.key
New Vault password: 
Confirm New Vault password: 
Encryption successful

You can see the file now is changed and starts with:

myuser@srv ~ $ cat server.key 
$ANSIBLE_VAULT;1.1;AES256
62363263663865646361643461663531373637386631646262366333663831643435633263363336
3735326665326363356566303566626638316662376432640a326362326230353966353431383164
35353531653331306430656562616165353632643330393662313535326438363964303436306639
....
....

STEP 2) Ansible playbook file to use copy and decrypt option

---
- hosts: all
  tasks:
    - name: Copy server private key
      copy:
        src: server.key
        dest: /etc/env/server.key
        decrypt: yes
        owner: root 
        group: root 
        mode: 400
        backup: no

STEP 3) Execute the ansible playbook

myuser@srv ~ $ ansible-playbook --ask-vault-pass -l srv3 -i ./inventory.ini ./playbook-example.yml -b
Vault password: 

PLAY [all] *****************************************************************************************************************************************************************

TASK [Gathering Facts] *****************************************************************************************************************************************************
ok: [srv3]

TASK [Copy server private key] *********************************************************************************************************************************************
changed: [srv3]

PLAY RECAP *****************************************************************************************************************************************************************
srv3                       : ok=2    changed=1    unreachable=0    failed=0   

And the file in the remote server (srv3 in the example) is unencrypted in /etc/env/server.key!

ansible – restart a (nginx) service only if it is running and the configuration is ok

Another ansible quick tip showing how to restart a program properly. We want to restart the program or the service only if it is running (because some system on executing restart may start the service even it is in the stopped state).
Here is what the ansible playbook do:

  1. Check if the program is running.
  2. Check the configuration of the program. Do not restart a program or service if it cannot start after a stop command because of bad configuration file(s).
  3. Restart the service (the program) only if the above two are true.

If you are a newbie in ansible you can check this article – First ansible use – install and execute a single command or multiple tasks in a playbook There you can see how to create your inventory file (and configure sudo if you remotely log in with unprivileged user) used herein the example.

Ansible YAML file

For our example we use the nginx webserver in the ansible playbook. Put the following code in a file and then execute ansible-playbook:

---
- hosts: all
  tasks:
            
    - name: Test for running nginx
      shell: ps axuf|grep 'nginx'|grep -v "grep" | tr -d "\n" | cat
      register: test_running_nginx
      changed_when: False
      tags: restart-nginx
      
    - name: First check the configuration
      shell: /usr/sbin/nginx -t
      register: test_nginx_config
      when: test_running_nginx.stdout != ""
      changed_when: False
      ignore_errors: True
      tags: restart-nginx
          
    - name: Restart nginx
      service: name=nginx state=restarted
      when: test_running_nginx.stdout != "" and test_nginx_config.rc == 0
      tags: restart-nginx

Here is how to run the above ansible playbook

myuser@srv ~ $ ansible-playbook -l srv2 -i ./inventory.ini ./playbook-example.yml -b

PLAY [all] *****************************************************************************************************************************************************************

TASK [Gathering Facts] *****************************************************************************************************************************************************
ok: [srv2]

TASK [Test for running nginx] **********************************************************************************************************************************************
ok: [srv2]

TASK [First check the configuration] ***************************************************************************************************************************************
ok: [srv2]

TASK [Restart nginx] *******************************************************************************************************************************************************
changed: [srv2]

PLAY RECAP *****************************************************************************************************************************************************************
srv2                       : ok=4    changed=1    unreachable=0    failed=0   

Here we add to the command line “-b”, which will escalate to root if it is needed (using sudo) because the remote connection is done with unprivileged user “myuser”. You can skip this option if you described the remote connection with the root user in the inventory file (or a system user, which has permissions to restart services).
Keep on reading!

ansible – insert after only if the pattern exists and the new insert is not there

Here is a quick ansible tip for system administrators for the ansible lineinfile. Imagine you want to insert a line after a word (or a predefined marker in your configuration file), but you want to insert the line ONLY if the word exists!
It could be done with lineinfile module but there is a limitation. The module will insert after the first occurrence of your marker or at the end of the file. Here is what the manual says: “If specified regular expression has no matches, EOF will be used instead.” And what if you what to insert some additional line to your structured configuration file? It will corrupt your configuration file, so we need something else!
Not only this! Imagine you have already inserted the line in a previous playbook run? It will be unwanted to add the line, again and again, each time the playbook is run. So here we propose the following solution:

  1. Test for existance of the file you want to insert text.
  2. Test for the existance of the marker (aka tag) in the file.
  3. Test for the existance of the line we want to insert.
  4. Insert the line after the marker (aka tag) if all of the above three conditions are true.

Here we use three ansible modules – stat, shell, lineinfile and variables and conditional checks.
If you are a newbie in ansible you can check this article – First ansible use – install and execute a single command or multiple tasks in a playbook There you can see how to create your inventory file (and configure sudo if you remotely log in with unprivileged user) used herein the example:

Ansible YAML file

---
- hosts: all
  tasks:
        - name: Test for nginx-config
          stat:
            path: /etc/nginx/nginx.conf
          register: test_exist_nginx_config
          tags: cors-insert-include
      
        - name: Test for \#FIRST-SRV-LOCATION tag
          shell: grep '#FIRST-SRV-LOCATION' /etc/nginx/nginx.conf | tr -d "\n" | cat
          register: test_first_srv_location
          when: test_exist_nginx_config.stat.exists
          changed_when: False
          tags: cors-insert-include

        - name: Test for cors-locations.loc inserted already
          shell: grep "cors-locations.loc" /etc/nginx/nginx.conf | tr -d "\n" | cat
          register: test_cors_locations_loc
          when: test_exist_nginx_config.stat.exists
          changed_when: False
          tags: cors-insert-include
          
        - name: Insert the includes after \#FIRST-SRV-LOCATION
          lineinfile:
            path: /etc/nginx/nginx.conf
            insertafter: '#FIRST-SRV-LOCATION'
            line: '                include /etc/nginx/conf.d/cors-locations.loc;'
            state: present
          when: test_exist_nginx_config.stat.exists and test_first_srv_location.stdout != "" and test_cors_locations_loc.stdout == ""
          tags: cors-insert-include

We want to insert a new include line after our predefined tag “#FIRST-SRV-LOCATION” in the nginx webserver’s main configuration file.

Here is how to run the above ansible playbook

Keep on reading!

First ansible use – install and execute a single command or multiple tasks in a playbook

This article is to show you how easy is to use automation tools for managing your servers. If you are new to ansible this article is right for you!

Installation

First, you must install ansible, which is pretty easy. At present all Linux distributions have the ansible package:

Ubuntu

sudo apt install ansible

CentOS 7

sudo yum install ansible

Fedora

sudo dnf install ansible

Gentoo

emerge -v ansible

Multiple python (version 3) packages will be pulled because the tool is based on python. The following files will appear in your machine (and a lot of python modules under the python directory of your Linux distribution):

/usr/bin/ansible
/usr/bin/ansible-config -> ansible
/usr/bin/ansible-connection
/usr/bin/ansible-console -> ansible
/usr/bin/ansible-doc -> ansible
/usr/bin/ansible-galaxy -> ansible
/usr/bin/ansible-inventory -> ansible
/usr/bin/ansible-playbook -> ansible
/usr/bin/ansible-pull -> ansible
/usr/bin/ansible-vault -> ansible

The important program name is ansible, with which you can do any of the other task.

What you can do using ansible with simple words

At present (July 2019) ansible 2.8.x has around 2080 modules (all modules here https://docs.ansible.com/ansible/latest/modules/list_of_all_modules.html) so you will find a solution for any automation task you may encounter. But here our purpose is to show you several simple commands.

ansible uses ssh to connect remotely to other machines and it is the best option to use ssh keys for passwordless connections

Still, ansible has the option to use also password authentication with “–ask-pass” option. In fact, connecting to the remote host could be done without ssh, but another protocol and this is beyond the scope of this article and it is rarely used.

Ansible modules could be used with different Linux distributions without specifying what kind of packaging software or init system is used.

So when you use module to install a package in your server you may not specify to use apt, yum or any other, or when you want to stop/start/reload/restart a service you do not need to specify it is a systemd or openrc or upstart or sysvinit and so on. The modules gather this information from the currently connected remote host and use the proper command to do its job. Look below in the playbook section.

The inventory file

The first thing to do is your file with servers. In terms of ansible, this is your “inventory file” – the file describing how to connect to your servers like hostname, ports, keys and so on.
The default inventory file is in /etc/ansible/hosts, but you can use file in any location if you include it in the ansible with “-i
So open your favorite text editor and write down your servers (it supports two syntaxes INI and YAML styles):

1) Just enumerate your servers’ hostnames.

Using default port 22 and the user you are logged in. Still, if you use “~/.ssh/config” and you included specific options like port, user, identity file these options will be used by ansible to connect to the hosts.

srv1.example.com
srv2.example.com

Keep on reading!

DBusException – Could not get owner of name ‘org.freedesktop.secrets’: no such name

There are programs, which heavily depend on a password store. Not sure why they cannot live without it but if you get errors of the following:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/dbus/bus.py", line 175, in activate_name_owner
    return self.get_name_owner(bus_name)
  File "/usr/lib64/python3.6/site-packages/dbus/bus.py", line 361, in get_name_owner
    's', (bus_name,), **keywords)
  File "/usr/lib64/python3.6/site-packages/dbus/connection.py", line 651, in call_blocking
    message, timeout)
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NameHasNoOwner: Could not get owner of name 'org.freedesktop.secrets': no such name

The error is the same as:

The name org.freedesktop.secrets was not provided by any .service files

The chances are you are using a graphical interface, which does not start a dbus service offering “DBus Secret Service API”.
The solution is to start a program or service offering a password store, which implements the “DBus Secret Service API” offering the name “org.freedesktop.secrets”.

The best and easy workaround in the environment, which do not offer such service is to use gnome-keyring-daemon from the package gnome-keyring.

Just install the package gnome-keyring and start “gnome-keyring-daemon”:
Under Ubuntu:

sudo apt install gnome-keyring
gnome-keyring-daemon

CentOS 7 / Fedora

sudo yum install gnome-keyring
gnome-keyring-daemon

Gentoo:

root@local ~$ emerge -v gnome-keyring
root@local ~$ exit
myuser@local ~$ gnome-keyring-daemon

And after the keyring daemon has started the program, which failed before, now it would start normally.

Even KDE uses KWallet, which at present does not support “DBus Secret Service API” and you may experience such behavior under KDE Plasma Desktop. Execute the program in the console to see the output (aka the errors).

We have encountered this error with the latest version of nagstamon under KDE Plasma Desktop (5.15.5):

myuser@my-desktop ~ $ nagstamon 
Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/dbus/bus.py", line 175, in activate_name_owner
    return self.get_name_owner(bus_name)
  File "/usr/lib64/python3.6/site-packages/dbus/bus.py", line 361, in get_name_owner
    's', (bus_name,), **keywords)
  File "/usr/lib64/python3.6/site-packages/dbus/connection.py", line 651, in call_blocking
    message, timeout)
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NameHasNoOwner: Could not get owner of name 'org.freedesktop.secrets': no such name

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python-exec/python3.6/nagstamon", line 31, in <module>
    from Nagstamon.Config import conf
  File "/usr/lib64/python3.6/site-packages/Nagstamon/Config.py", line 41, in <module>
    import keyring
  File "/usr/lib64/python3.6/site-packages/keyring/__init__.py", line 3, in <module>
    from .core import (set_keyring, get_keyring, set_password, get_password,
  File "/usr/lib64/python3.6/site-packages/keyring/core.py", line 154, in <module>
    init_backend()
  File "/usr/lib64/python3.6/site-packages/keyring/core.py", line 67, in init_backend
    keyrings = filter(limit, backend.get_all_keyring())
  File "/usr/lib64/python3.6/site-packages/keyring/util/__init__.py", line 21, in wrapper
    func.always_returns = func(*args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/keyring/backend.py", line 179, in get_all_keyring
    exceptions=TypeError))
  File "/usr/lib64/python3.6/site-packages/keyring/util/__init__.py", line 31, in suppress_exceptions
    for callable in callables:
  File "/usr/lib64/python3.6/site-packages/keyring/backend.py", line 171, in is_class_viable
    keyring_cls.priority
  File "/usr/lib64/python3.6/site-packages/keyring/util/properties.py", line 24, in __get__
    return self.fget.__get__(None, owner)()
  File "/usr/lib64/python3.6/site-packages/keyring/backends/SecretService.py", line 38, in priority
    list(secretstorage.get_all_collections(bus))
  File "/usr/lib64/python3.6/site-packages/secretstorage/collection.py", line 144, in get_all_collections
    service_obj = bus_get_object(bus, SS_PATH)
  File "/usr/lib64/python3.6/site-packages/secretstorage/util.py", line 55, in bus_get_object
    return bus.get_object(name, object_path, introspect=False)
  File "/usr/lib64/python3.6/site-packages/dbus/bus.py", line 241, in get_object
    follow_name_owner_changes=follow_name_owner_changes)
  File "/usr/lib64/python3.6/site-packages/dbus/proxies.py", line 248, in __init__
    self._named_service = conn.activate_name_owner(bus_name)
  File "/usr/lib64/python3.6/site-packages/dbus/bus.py", line 180, in activate_name_owner
    self.start_service_by_name(bus_name)
  File "/usr/lib64/python3.6/site-packages/dbus/bus.py", line 278, in start_service_by_name
    'su', (bus_name, flags)))
  File "/usr/lib64/python3.6/site-packages/dbus/connection.py", line 651, in call_blocking
    message, timeout)
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.secrets exited with status 127

aptly publish ERROR: unable to publish: unable to process packages: error linking file to

We’ve encountered the following error when issuing a publish command:

aptly@aptly-server:~$ aptly --config=/mnt/storage/aptly/.aptly.conf publish snapshot xenial-myrepo-initial ubuntu
Loading packages...
Generating metadata files and linking package files...
ERROR: unable to publish: unable to process packages: error linking file to /mnt/storage/aptly/.aptly/public/ubuntu/pool/main/s/sftpcloudfs/sftpcloudfs_0.12.2-2_all.deb: file already exists and is different

And the snapshot had failed to publish. Check if the file is “aptly:aptly” (or the user and group your installation uses) because if someone has executed commands from the user root it may create some files with the user root (or other) and after that, some commands could fail. In our case, the file was with the right user for aptly and the solution was to remove the file manually (i.e. it is safe to remove it!) it was created again by the setup in the right time. Then execute the publish command again:

aptly@aptly-server:~$ rm /mnt/storage/aptly/.aptly/public/ubuntu/pool/main/s/sftpcloudfs/sftpcloudfs_0.12.2-2_all.deb 
aptly@aptly-server:~$ aptly --config=/mnt/storage/aptly/.aptly.conf publish snapshot xenial-myrepo-initial ubuntu
Loading packages...
Generating metadata files and linking package files...
Finalizing metadata files...
Signing file 'Release' with gpg, please enter your passphrase when prompted:
Clearsigning file 'Release' with gpg, please enter your passphrase when prompted:

Snapshot xenial-myrepo-initial has been successfully published.
Please setup your webserver to serve directory '/mnt/storage/aptly/.aptly/public' with autoindexing.
Now you can add following line to apt sources:
  deb http://your-server/ubuntu/ xenial-myrepo main
  deb-src http://your-server/ubuntu/ xenial-myrepo main
Don't forget to add your GPG key to apt with apt-key.

You can also use `aptly serve` to publish your repositories over HTTP quickly.

Common mistakes to appear this error are

  • File permissions
  • File ownership. As mentioned above aptly command executed by other user (like root). Probably it is a good idea to chown recursively the whole aptly root directory
  • Inerrupting the publish command execution
  • Inerrupting the drop command execution

The solution is simple, just remove the offensive file(s) and execute the command again. It is safe to remove the file manually.

Recovering MD array and mdadm: Cannot get array info for /dev/md0

What a case! A long story short one of our disks got a bad disk in a software RAID1 setup and when we tried replacing the disk in a recovery Linux console we got the strange error of an MD device:

mdadm: Cannot get array info for /dev/md125

And ccording to the /proc/mdstat the device was there and mdadm -E reported the array was “clean”.

root@631019 ~ # mdadm --add /dev/md125 /dev/sdb2
mdadm: Cannot get array info for /dev/md125

root@631019 ~ # cat /proc/mdstat                                                        πŸ™
Personalities : [raid0] [linear] [multipath] [raid1] [raid6] [raid5] [raid4] [raid10] 
md122 : inactive sda4[0](S)
      33520640 blocks super 1.2
       
md123 : inactive sda5[0](S)
      1914583040 blocks super 1.2
       
md124 : inactive sda3[0](S)
      4189184 blocks super 1.2
       
md125 : inactive sda2[0](S)
      1048512 blocks
       
unused devices: <none>

root@631019 ~ # mdadm -E /dev/sda2                                                      πŸ™
/dev/sda2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : aff708ee:16669ffb:1a120e13:7e9185ae
  Creation Time : Thu Mar 14 15:10:21 2019
     Raid Level : raid1
  Used Dev Size : 1048512 (1023.94 MiB 1073.68 MB)
     Array Size : 1048512 (1023.94 MiB 1073.68 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 126

    Update Time : Thu Jul 11 10:22:17 2019
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : c1ee0a10 - correct
         Events : 103


      Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2

The important piece of information here is that the RAID1 is in an inactive state, which is really strange! It is perfectly normal to be started with one disk missing (the raid as you can see consists from 2 disks) and in read-only mode before mounting it. But here it is in an inactive state! The output of /proc/mdstat shows a sign of inappropriate assembly of all those arrays probably during the boot of the rescue Linux system – missing information or old version of mdadm utility or some other configuration loaded! In such states – inactive and as you see no information about the type of the arrays it is normal mdadm to report error it could not get current array info. The key word here is CURRENT despite mdadm misses it in the error output:

root@631019 ~ # mdadm --add /dev/md125 /dev/sdb2
mdadm: Cannot get array info for /dev/md125

Because in fact mdadm tries adding a disk in the currently loaded configuration, not the real one in your disks!

The solution

  1. Remove ALL current configuration by issuing multiple stop commands with mdadm, no inactive raids or any raids should be reported in “/proc/mdstat”.
  2. Remove (or better rename) mdadm configuration files in /etc/mdadm.conf (in some Linux distributions is /etc/mdadm/mdadm.conf).
  3. Rescan for MD devices with mdadm. The mdadm will load the configuration from your disks.
  4. Add the missing partitions to your software raid devices.

Keep on reading!

aptly publish: gpg: no default secret key: secret key not available

This is also a common error in a typical aptly installation. The other two common errors related to the GPG keys are: aptly publish: ERROR: unable to initialize GPG signer. Missing pubring.gpg keys and aptly mirror – gpgv: Can’t check signature: public key not found. This secret key is used when you try to publish a repository (snapshot or mirror).

root@srv-aptly ~ # aptly publish snapshot xenial-myrepo-initial
Loading packages...
Generating metadata files and linking package files...
 15683 / 107250 [====================>--------------------------------------------------------------------------------------------------------------------]  14.62% 2h53m50s 
17025 / 107250 [=====================>--------------------------------------------------------------------------------------------------------------------]  15.87% 3h5m15sFinalizing metadata files...
Signing file 'Release' with gpg, please enter your passphrase when prompted:
gpg: no default secret key: secret key not available
gpg: signing failed: secret key not available
ERROR: unable to publish: unable to detached sign file: exit status 2

You are unable to sign the Release file because the keyring secring.gpg is missing a GPG key. Just create or import from your current servers the GPG key from keyring secring.gpg (for the root user it is /root/.gnupg/secring.gpg and in general this is the default path /[my-aptly-home-directory]/.gnupg/secring.gpg).

Here is the example with the two servers, exporting from your current and importing the key in your new (the second) server:

Export the secring.gpg GPG key from your server
root@srv-aptly-1:~ # gpg --list-keys --keyring secring.gpg
/root/.gnupg/secring.gpg
------------------------
pub   2048D/FDC7A25E 2017-09-16
uid                  My-aptly (aptly key no passphrase) <my-aptly@example.com>

root@srv-aptly-1:~ # gpg --keyring secring.gpg --export --armor FDC7A25E > FDC7A25E.key
root@srv-aptly-1:~ # gpg --list-secret-keys --keyring secring.gpg
/root/.gnupg/secring.gpg
------------------------
sec   2048D/FDC7A25E 2017-09-16
uid                  My-aptly (aptly key no passphrase) <my-aptly@example.com>

root@srv-aptly-1:~ # gpg --keyring secring.gpg --export-secret-key --armor FDC7A25E > FDC7A25E.sec

First is the public key (FDC7A25E.key) and second is the private key (FDC7A25E.sec). You must export them both and import them in your new server (or look below how to generate them in your server).

Copy the file to the second server (FDC7A25E.key) and then import it in keyring secring.gpg
root@srv-aptly-2:~ # cat ./FDC7A25E.key| gpg --keyring secring.gpg --import
gpg: key FDC7A25E: public key "My-aptly (aptly key no passphrase) <my-aptly@example.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1
root@srv-aptly-2:~ # gpg --keyring secring.gpg --allow-secret-key-import --armor --import FDC7A25E.sec 
gpg: key FDC7A25E: secret key imported
gpg: key FDC7A25E: "My-aptly (aptly key no passphrase) <my-aptly@example.com>" not changed
gpg: Total number processed: 1
gpg:              unchanged: 1
gpg:       secret keys read: 1
gpg:   secret keys imported: 1

And now you can publish your repository with:

root@srv-aptly-2: ~ # aptly publish snapshot xenial-myrepo-initial ubuntu
Loading packages...
Generating metadata files and linking package files...
Finalizing metadata files...
Signing file 'Release' with gpg, please enter your passphrase when prompted:
Clearsigning file 'Release' with gpg, please enter your passphrase when prompted:

Snapshot xenial-myrepo-initial has been successfully published.
Please setup your webserver to serve directory '/mnt/storage/aptly/.aptly/public' with autoindexing.
Now you can add following line to apt sources:
  deb http://your-server/ubuntu/ xenial-myrepo main
  deb-src http://your-server/ubuntu/ xenial-myrepo main
Don't forget to add your GPG key to apt with apt-key.

You can also use `aptly serve` to publish your repositories over HTTP quickly.

The operation publish passed successfully.

Generate GPG Key

If you just came here installing a new aptly server and getting this error as mentioned above you miss a GPG key in keyring secring.gpg.

root@srv-aptly: ~# gpg --default-new-key-algo rsa4096 --gen-key --keyring secring.gpg
gpg (GnuPG) 2.2.11; Copyright (C) 2018 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Note: Use "gpg --full-generate-key" for a full featured key generation dialog.

GnuPG needs to construct a user ID to identify your key.

Real name: My-aptly
Email address: my-aptly@example.com
You selected this USER-ID:
    "MyName <my-aptly@example.com>"

Change (N)ame, (E)mail, or (O)kay/(Q)uit? O
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
gpg: key B14B67D0CF27191B marked as ultimately trusted
gpg: revocation certificate stored as '/root/.gnupg/openpgp-revocs.d/77EC42A1F16127C83509292BB14B67D0CF27191B.rev'
public and secret key created and signed.

Note that this key cannot be used for encryption.  You may want to use
the command "--edit-key" to generate a subkey for this purpose.
pub   rsa4096 2019-07-08 [SC] [expires: 2021-07-07]
      77EC42A1F16127C83509292BB14B67D0CF27191B
uid                      MyName <my-aptly@example.com>

NOTE

Just to note here we give you all the examples with the root user and the GPG keys are for the root user. You may use a different user for the aptly process and you must ensure the GPG keys to present for this user (the directories and files are the same, just home directory is different – the home directory of the aptly user i.e. “/[my-aptly-home-directory]/.gnupg/secring.gpg” and for all other GPG files “/[my-aptly-home-directory]/.gnupg/”).