About the author

My name is Solène Rapenne. I like to learn and share my knowledge with other. With this blog I can share my experiences and issues. Some of my interests : '(BSD Lisp Emacs cli-tool Web-infrastructure Gaming Crossbow). I love % and lambda characters.

Contact : solene on Freenode or solene+www at dataswamp dot org

This website is generated using cl-yag. A gopher version is available here

How to check your data integrity ?

Written by Solène, on 17 March 2017.
Tags: #unix #security #data

Today, the topic is data degradation, bit rot, birotting, damaged files or whatever you call it. It’s when your data get corrupted over the time, due to disk fault or some unknown reason.

What is data degradation ?

I shamelessy paste one line from wikipedia : “Data degradation is the gradual corruption of computer data due to an accumulation of non-critical failures in a data storage device. The phenomenon is also known as data decay or data rot.”.

Data degradation on Wikipedia

So, how do we know we encounter a bit rot ?

bit rot = (checksum changed) && NOT (modification time changed)

While updating a file could be mistaken as bit rot, there is a difference

update = (checksum changed) && (modification time changed)

How to check if we encounter bitrot ?

There is no way you can prevent bitrot. But there are some ways to detect it, so you can restore a corrupted file from a backup, or repair it with the right tool (you can’t repair a file with a hammer, except if it’s some kind of HammerFS ! :D )

In the following I will describe software I found to check (or even repair) bitrot. If you know others tools which are not in this list, I would be happy to hear about it, please mail me.

In the following examples, I will use this method to generate bitrot on a file :

% touch -d "2017-03-16T21:04:00" my_data/some_file_that_will_be_corrupted
% generate_checksum_database_with_tool
% echo "a" >> my_data/some_file_that_will_be_corrupted
% touch -d "2017-03-16T21:04:00" my_data/some_file_that_will_be_corrupted
% start_tool_for_checking

We generate the checksum database, then we alter a file by adding a “a” at the end of the file and we restore the modification and acess time of the file. Then, we start the tool to check for data corruption.

The first touch is only for convenience, we could get the modification time with stat command and pass the same value to touch after modification of the file.

bitrot

This is a python script, it’s very easy to use. I will scan a directory and create a database with the checksum of the files and their modification date.

Initialization usage :

% cd /home/my_data/
% bitrot
Finished. 199.41 MiB of data read. 0 errors found.
189 entries in the database, 189 new, 0 updated, 0 renamed, 0 missing.
Updating bitrot.sha512... done.
% echo $?
0

Verify usage (case OK) :

% cd /home/my_data/
% bitrot
Checking bitrot.db integrity... ok.
Finished. 199.41 MiB of data read. 0 errors found.
189 entries in the database, 0 new, 0 updated, 0 renamed, 0 missing.
% echo $?
0

Exit status is 0, so our data are not damaged.

Verify usage (case Error) :

% cd /home/my_data/
% bitrot
Checking bitrot.db integrity... ok.
error: SHA1 mismatch for ./sometextfile.txt: expected 17b4d7bf382057dc3344ea230a595064b579396f, got db4a8d7e27bb9ad02982c0686cab327b146ba80d. Last good hash checked on 2017-03-16 21:04:39.
Finished. 199.41 MiB of data read. 1 errors found.
189 entries in the database, 0 new, 0 updated, 0 renamed, 0 missing.
error: There were 1 errors found.
% echo $?
1

When something is wrong. As the exit status of bitrot isn’t 0 when it fails, it’s easy to write a script running every day/week/month.

Github page

bitrot is available in OpenBSD ports in sysutils/bitrot since 6.1 release.

par2cmdline

This tool works with PAR2 archives (see below for more informations about what PAR ) and from them, it will be able to check your data integrity AND repair it.

While it has some pros like being able to repair data, the cons is that it’s not very easy to use. I would use this one for checking integrity of long term archives that won’t changes. The main drawback comes from PAR specifications, the archives are created from a filelist, if you have a directory with your files and you add new files, you will need to recompute ALL the PAR archives because the filelist changed, or create new PAR archives only for the new files, but that will make the verify process more complicated. That doesn’t seems suitable to create new archives for every bunchs of files added in the directory.

PAR2 let you choose the percent of a file you will be able to repair, by default it will create the archives to be able to repair up to 5% of each file. That means you don’t need a whole backup for the files (while it’s would be a bad idea) and only an approximately extra of 5% of your data to store.

Create usage :

% cd /home/
% par2 create -a integrity_archive -R my_data
Skipping 0 byte file: /home/my_data/empty_file

Block size: 3812
Source file count: 17
Source block count: 2000
Redundancy: 5%
Recovery block count: 100
Recovery file count: 7

Opening: my_data/[....]
[text cut here]
Opening: my_data/[....]

Computing Reed Solomon matrix.
Constructing: done.
Wrote 381200 bytes to disk
Writing recovery packets
Writing verification packets
Done

% echo $?
0

% ls -1
integrity_archive.par2
integrity_archive.vol000+01.par2
integrity_archive.vol001+02.par2
integrity_archive.vol003+04.par2
integrity_archive.vol007+08.par2
integrity_archive.vol015+16.par2
integrity_archive.vol031+32.par2
integrity_archive.vol063+37.par2
my_data

Verify usage (OK) :

% par2 verify integrity_archive.par2 
Loading "integrity_archive.par2".
Loaded 36 new packets
Loading "integrity_archive.vol000+01.par2".
Loaded 1 new packets including 1 recovery blocks
Loading "integrity_archive.vol001+02.par2".
Loaded 2 new packets including 2 recovery blocks
Loading "integrity_archive.vol003+04.par2".
Loaded 4 new packets including 4 recovery blocks
Loading "integrity_archive.vol007+08.par2".
Loaded 8 new packets including 8 recovery blocks
Loading "integrity_archive.vol015+16.par2".
Loaded 16 new packets including 16 recovery blocks
Loading "integrity_archive.vol031+32.par2".
Loaded 32 new packets including 32 recovery blocks
Loading "integrity_archive.vol063+37.par2".
Loaded 37 new packets including 37 recovery blocks
Loading "integrity_archive.par2".
No new packets found

There are 17 recoverable files and 0 other files.
The block size used was 3812 bytes.
There are a total of 2000 data blocks.
The total size of the data files is 7595275 bytes.

Verifying source files:

Target: "my_data/....." - found.
[...cut here...]
Target: "my_data/....." - found.


All files are correct, repair is not required.
% echo $?
0

Verify usage (with error) :

par2 verify integrity_archive.par.par2                                                 
Loading "integrity_archive.par.par2".
Loaded 36 new packets
Loading "integrity_archive.par.vol000+01.par2".
Loaded 1 new packets including 1 recovery blocks
Loading "integrity_archive.par.vol001+02.par2".
Loaded 2 new packets including 2 recovery blocks
Loading "integrity_archive.par.vol003+04.par2".
Loaded 4 new packets including 4 recovery blocks
Loading "integrity_archive.par.vol007+08.par2".
Loaded 8 new packets including 8 recovery blocks
Loading "integrity_archive.par.vol015+16.par2".
Loaded 16 new packets including 16 recovery blocks
Loading "integrity_archive.par.vol031+32.par2".
Loaded 32 new packets including 32 recovery blocks
Loading "integrity_archive.par.vol063+37.par2".
Loaded 37 new packets including 37 recovery blocks
Loading "integrity_archive.par.par2".
No new packets found

There are 17 recoverable files and 0 other files.
The block size used was 3812 bytes.
There are a total of 2000 data blocks.
The total size of the data files is 7595275 bytes.

Verifying source files:


Target: "my_data/....." - found.
[...cut here...]
Target: "my_data/....." - found.
Target: "my_data/Ebooks/Lovecraft/Quete Onirique de Kadath l'Inconnue.epub" - damaged. Found 95 of 95 data blocks.

Scanning extra files:


Repair is required.
1 file(s) exist but are damaged.
16 file(s) are ok.
You have 2000 out of 2000 data blocks available.
You have 100 recovery blocks available.
Repair is possible.
You have an excess of 100 recovery blocks.
None of the recovery blocks will be used for the repair.

% echo $?
1

Repair usage :

% par2 repair integrity_archive.par.par2      
Loading "integrity_archive.par.par2".
Loaded 36 new packets
Loading "integrity_archive.par.vol000+01.par2".
Loaded 1 new packets including 1 recovery blocks
Loading "integrity_archive.par.vol001+02.par2".
Loaded 2 new packets including 2 recovery blocks
Loading "integrity_archive.par.vol003+04.par2".
Loaded 4 new packets including 4 recovery blocks
Loading "integrity_archive.par.vol007+08.par2".
Loaded 8 new packets including 8 recovery blocks
Loading "integrity_archive.par.vol015+16.par2".
Loaded 16 new packets including 16 recovery blocks
Loading "integrity_archive.par.vol031+32.par2".
Loaded 32 new packets including 32 recovery blocks
Loading "integrity_archive.par.vol063+37.par2".
Loaded 37 new packets including 37 recovery blocks
Loading "integrity_archive.par.par2".
No new packets found

There are 17 recoverable files and 0 other files.
The block size used was 3812 bytes.
There are a total of 2000 data blocks.
The total size of the data files is 7595275 bytes.

Verifying source files:

Target: "my_data/....." - found.
[...cut here...]
Target: "my_data/....." - found.
Target: "my_data/Ebooks/Lovecraft/Quete Onirique de Kadath l'Inconnue.epub" - damaged. Found 95 of 95 data blocks.

Scanning extra files:


Repair is required.
1 file(s) exist but are damaged.
16 file(s) are ok.
You have 2000 out of 2000 data blocks available.
You have 100 recovery blocks available.
Repair is possible.
You have an excess of 100 recovery blocks.
None of the recovery blocks will be used for the repair.


Wrote 361069 bytes to disk

Verifying repaired files:

Target: "my_data/Ebooks/Lovecraft/Quete Onirique de Kadath l'Inconnue.epub" - found.

Repair complete.

% echo $?
0

par2cmdline is only one implementation doing the job, others tools working with PAR archives exists. They should be able to all works with the same PAR files.

Parchive on Wikipedia

Github page

par2cmdline is available in OpenBSD ports in archivers/par2cmdline.

If you find a way to add new files to existing archives, please mail me.

mtree

One can write a little script using mtree (in base system on OpenBSD and FreeBSD) which will create a file with the checksum of every files in the specified directories. If mtree output is different since last time, we can send a mail with the difference. This is a process done in base install of OpenBSD for /etc and some others files to warn you if it changed.

While it’s suited for directories like /etc, in my opinion, this is not the best tool for doing integrity check.

ZFS

I would like to talk about ZFS and data integrity because this is where ZFS is very good. If you are using ZFS, you may not need any other software to take care about your data. When you write a file, ZFS will also store its checksum as metadata. By default, the option “checksum” is activated on dataset, but you may want to disable it for better performance.

There is a command to ask ZFS to check the integrity of the files. Warning : scrub is very I/O intensive and can takes from hours to days or even weeks to complete depending on your CPU, disks and the amount of data to scrub :

# zpool scrub zpool

The scrub command will recompute the checksum of every file on the ZFS pool, if something is wrong, it will try to repair it if possible. A repair is possible in the following cases :

If you have multiple disks like raid-Z or raid–1 (mirror), ZFS will be look on the differents disks if the non corrupted version of the file exists, if it finds it, it will restore it on the disk(s) where it’s corrupted.

If you have set the ZFS option “copies” to 2 or 3 (1 = default), that means that the file is written 2 or 3 time on the disk. Each file of the dataset will be allocated 2 or 3 time on the disk, so take care if you want to use it on a dataset containing heavy files ! If ZFS find thats a version of a file is corrupted, it will check the others copies of it and tries to restore the corrupted file is possible.

You can see the percentage of filesystem already scrubbed with

zfs status zpool

and the scrub can be stopped with

zfs scrub -s zpool

AIDE

Its name is an acronym for “Advanced Intrusion Detection Environment”, it’s an complicated software which can be used to check for bitrot. I would not recommend using it if you only need bitrot detection.

Here is a few hints if you want to use it for checking your file integrity :

/etc/aide.conf

/home/my_data/ R
# Rule definition
All=m+s+i+sha256
summarize_changes=yes

The config file will create a database of all files in /home/my_data/ (R for recursive). “All” line list the checks we do on each file. For bitrot checking, we want to check modification time, size, checksum and inode of the files. The summarize_change line permit to have a list of changes if something is wrong.

This is the most basic config file you can have. Then you will have to run aide to create the database and then run aide to create a new database and compare the two databases. It doesn’t update its database itself, you will have to move the old database and tell it where to found the older database.

My use case

I have different kind of data. On a side, I have static data like pictures, clips, music or things that won’t change over time and the other side I have my mails, documents and folders where the content changes regularly (creation, deletetion, modification). I am able to afford a backup for 100% of my data with some history of the backup on a few days, so I won’t be interested about file repairing.

I want to be warned quickly if a file get corrupted, so I can still get the backup in my history but I don’t keep every versions of my files for too long. I choose to go with the python tool bitrot, it’s very easy to use and it doesn’t become a mess with my folders getting updated often.

I would go with par2cmdline if I could not be able to backup all my data. Having 5% or 10% of redundancy of my files should be enough to restore it in case of corruption without taking too much space.

Port of the week : dnscrypt-proxy

Written by Solène, on 19 October 2016.
Tags: #unix #security #portoftheweek #dns

Today I will talk about net/dnscrypt-proxy. This let you encrypt your DNS traffic between your resolver and the remote DNS recursive server. More and more countries and internet provider use DNS to block some websites, and now they tend to do “man in the middle” with DNS answers, so you can’t just use a remote DNS you find on the internet. While a remote dnscrypt DNS server can still be affected by such “man in the middle” hijack, there is a very little chance DNS traffic is altered in datacenters / dedicated server hosting.

The article also deal with unbound as a dns cache because dnscrypt is a bit slow and asking multiple time the same domain in a few minutes is a waste of cpu/network/time for everyone. So I recommend setting up a DNS cache on your side (which can also permit to use it on a LAN).

At the time I write this article, their is a very good explanation about “how to install it” is named dnscrypt-proxy–1.7.0p1 in the folder /usr/local/share/doc/pkg-readmes/. The following article is made from this file. The file on OpenBSD 6.0 don’t speak about unbound (while in -current, next 6.1, this is explained).

While I write for OpenBSD this can be easily adapted to anthing else Unix-like.

Resolv.conf

Modify your resolv.conf file to this

/etc/resolv.conf :

nameserver 127.0.0.1
lookup file bind
options edns0

If you use dhcp, you can use the following line to force having 127.0.0.1 as nameserver by modifying dhclient config file

Dhcp ?

/etc/dhclient.conf :

supersede domain-name-servers 127.0.0.1;

Unbound

Now, we need to modify unbound config to tell him to ask DNS at 127.0.0.1 port 40. Please adapt your config, I will just add what is mandatory. Unbound configuration file isn’t in /etc because it’s chrooted

/var/unbound/etc/unbound.conf :

server:
    # ↓ this line is MANDATORY ↓
    do-not-query-localhost: no

forward-zone:
    name: "."
    forward-addr: 127.0.0.1@40
    # ↑ address dnscrypt listen on

If you want to allow other to resolv through your unbound daemon, please see parameters interface and access-control. You will need to tell unbound to bind on external interfaces and allow requests on it.

Dnscrypt-proxy

Now we need to configure dnscrypt, pick a server in the following LIST /usr/local/share/dnscrypt-proxy/dnscrypt-resolvers.csv, the name is the first column.

As root type the following (or use doas/sudo), in the example we choose dnscrypt.eu-nl as a DNS provider

rcctl enable dnscrypt_proxy
rcctl set dnscrypt_proxy flags -E -m1 -R dnscrypt.eu-nl -a 127.0.0.1:40
rcctl start dnscrypt_proxy

Conclusion

You should be able to resolv address through dnscrypt now. You can use tcpdump on your external interface to see if you see something on udp port 53, you should not see traffic there.

If you want to use dig hostname -p 40 @127.0.0.1 to make DNS request to dnscrypt without unbound, you will need net/isc-bind which will provide /usr/local/bin/dig. OpenBSD base dig can’t use a port different than 53.

Port of the week : pwgen

Written by Solène, on 12 August 2016.
Tags: #security #portoftheweek

I will talk about security/pwgen for the current port of the week. It’s a very light executable to generate passwords. But it’s not just a dumb password generator, it has options to choose what kind of password you want.

Here is a list of options with their flag, you will find a lot more in the nice man page of pwgen :

  • -A : don’t use capital letters
  • -B : don’t use characters which could be missread (O/0, I/l/1 …)
  • -v : don’t use vowels
  • etc…

You can also use a seed to generate your “random” password (which aren’t very random in this case), you may need it for some reason to be able to reproduce password you lost for a ftp/http access for example.

Example of pwgen output generating 5 password of 10 characters. Using –1 parameter so it will only display one password per line, otherwise it display a grid (on column and multiple lines) of passwords.

solene:~# pwgen -1 10 5
fohchah9oP
haNgeik0ee
meiceeW8ae
OReejoi5oo
ohdae2Eisu

Stop being tracked by Google search with Firefox

Written by Solène, on 04 July 2016.
Tags: #security #firefox #web

When you use google search and you click on a link, you a redirected on a google server that will take care of saving your navigation choice from their search engine into their database.

  1. This is bad for your privacy
  2. This slow the process of using the search engine because you have a redirection (that you don’t see) when you want to visit a link

There is a firefox extension that will fix the links in the results of the search engine so when you click, you just go on the website without saying “hello Google I clicked there” : Google Search Link Fix

You can also use another web engine if you don’t like Google. I keep it because I have best results when searching technical. I tried to use Yahoo, Bing, Exalead, Qwant, Duck duck go, each one for a few days and Google has the bests results so far.