Policies for Open Source use at the workplace

Things I like about some companies’ approach to software installation on employee’s machines:

  • No mandatory and restrictive centralized software distribution system where all employees have to hope and pray that certain tools are available and up-to-date.
  • No URL blocking or similar measures of distrust that would prevent employees from downloading and installing software.

This leaves room for a focus on education, knowledge sharing and trust that employees act responsibly.

Based on my experience, these are some useful policies / objectives for Open Source use at the workplace (especially for IT companies):

  • All employees who work with a computer learn the basics about Open Source licenses (OSI definition, copyleft vs permissive, see link below).
  • All employees who work with a computer learn about the differences of Open Source vs Freeware vs Shareware.
  • The company establishes simple and employee-friendly policies for the use of Open Source at work:
    • Declare the major permissive (non-copyleft) licenses (Apache, MIT, BSD, LGPL, EPL, …) as pre-approved for all software library and tool use.
    • Declare all OSI approved licenses as pre-approved for stand-alone tools, i.e. where copyleft cannot affect any derived or bundled code developed at the workplace.
    • Software that satisfies the pre-approval criteria above should not require any further request or approval process.
  • Additionally, the company could maintain a “blacklist” of software (versions) that are known to have security flaws or other aspects that make them unsuitable for use at the workplace.
  • Recommendations can be given to watch out for and uncheck unwanted add-ons during software installations.
  • Software development companies, should train their team leads in basic legal aspects like software copyright and license terms.

Nightly file server backups to external harddrive

I use a small headless Debian system as file server for all family photos, videos, documents, etc. Its hostname is “bubba”. I have recently set it up to run backups to an external harddrive, using cron and rsync.

The external disk is a 500G laptop SATA disk in an USB/eSATA enclosure. It requires no separate power supply. So far I have only got it to work over USB. Somehow the eSATA does not work for me on Debian 6 (aka “squeeze”), even though the file server has an eSATA port.

Prerequisites


sudo mkdir /mnt/backup
sudo apt-get install ntfs-3g

NTFS mount/unmount with sudo

I use NTFS as the filesystem on the backup disk because we wanted it to be compatible with MS Windows. The Debian Linux on the file server uses ntfs-3g for mounting the disk read-write. Unfortunately that only works well with root rights, so I configured sudo to permit myself password-less mounting and unmounting of the device.

/etc/sudoers entry
oliver ALL = NOPASSWD: /bin/mount /mnt/backup, \
                       /bin/umount /mnt/backup

Nightly rsync

The nightly backup process itself is a simple non-destructive local rsync command, wrapped by mount and unmount commands, to make sure that we can unplug the external disk anytime we want (just not around midnight).

My crontab:


oliver@bubba:~$ crontab -l
0 0 * * * /home/oliver/shared/scripts/backup.sh

The backup.sh script
#! /bin/sh

if mountpoint /mnt/backup; then
  sudo umount /mnt/backup
fi

sudo mount /mnt/backup

if [ $? -eq 0 ]; then

  rsync -avvih --progress \
    --exclude /downloads \
    --exclude /movies \
  /home/storage/ /home/oliver/backup \
  > /tmp/cron_output.log 2>&1

fi

sudo umount /mnt/backup

Symlinks and fstab

Symlink in my home for convenience:

oliver@bubba:~$ ls -l /home/oliver/backup
lrwxrwxrwx 1 oliver users 11 Aug 7 21:38 /home/oliver/backup -> /mnt/backup/

Entry in /etc/fstab:

oliver@bubba:~$ grep "/mnt/backup" /etc/fstab
/usr/local/share/backup /mnt/backup ntfs-3g defaults 0 0

Device symlink /usr/local/share/backup:

oliver@bubba:~$ ls -l /usr/local/share/backup
lrwxrwxrwx 1 root staff 84 Jun 23 01:51 /usr/local/share/backup -> /dev/disk/by-id/usb-WDC_WD50_00BPVT-00HXZT3_FDC0FD500000000FD0FF61A6103926-0:0-part1

Try it manually


/home/oliver/shared/scripts/backup.sh &
less /tmp/cron_output.log

Room for improvement

The symlink to the device file is the ugliest part of the whole solution. Currently I have to plug the disk directly into a USB slot on the file server because if I connect it via a USB hub, it will appear under a different name in /dev/disk/by-id and my symlink won’t work. I would like to use a udev rule instead that automatically creates an identical symlink no matter how the the disk is plugged in.

I would also like to implement a 2-way backup so that files we put on the external disk, for example photos from a trip to relatives, will be mirrored to the file server. It should be just another rsync command going in the opposite reaction.

Maybe I would also like the backup process to start right away when the disk is plugged in, in addition to the nightly cron job. This would probably require another udev rule.

My current IntelliJ code inspections profile for Java projects

I recently exported my IntelliJ code inspections profile for Java projects from IntelliJ Community Edition to share it with whoever might be interested.

These highly customized code inspections are based on industry standards like the official Java code conventions, various best practices from the Java community and my experience over many years as Java developer and team lead, trying to ensure code quality and maintainability.

Feel free to download and save it as a local XML file. Then you can import it into any of your IntelliJ projects via Analyze – Inspect Code – Inspection profile – […] button – Import:

Blogging in the open

In many ways it is a step in a good direction that many companies use internal “collaboration portals’ with chat forums, wikis, etc. But it also segregates these corporate communities from the public web. Certainly, company-internal proprietary knowledge and business discussions belong behind the firewall, but that is only a fraction of the knowledge sharing going on at work.

It is unfortunate that even non-proprietary conversations about Open Source tools and technologies get separated from the public internet. I think at least the so-called “personal blogs” that many corporate collaboration portals offer belong into the worldwide “blogosphere”, not on intranets or other “walled gardens” systems.

Judging from my own experience with oldoldo.wordpress.com, bloggers and professional software developers actually benefit from using public blogging services. Platforms like WordPress are very good at making articles findable via search engines like Google, allow categorization, attachment management, feed update notifications into Twitter, LinkedIn, etc. And on the whole, a public blog generates more useful feedback and discussions than limiting the same thing to the people who happen to work for the same employer.

To directly notify coworkers about new posts on my blog, I simply post the link (i.e. the hopefully permanent URL) on my employers collaboration site, not the content itself.

And if I ever change jobs, I will still have access to my own posts. That certainly helps an older guy like me, with a weak memory for details … :)

Uncertain future for Excito Bubba home servers

I own an Excito Bubba/2 file and print server, running Debian Squeeze. Mostly I am quite happy with it.

Now recently the CTO of the Swedish manufacturer announced that Excito is shifting focus, the Bubba/3 product is not marketed on the main excito.com website anymore, but sold off at discount prices on their web store.

This shift seems to be a logical step given the split of the originally 4 founders of the company a few years ago and Excito’s ongoing struggle to support the versatile Debian based Bubba servers.

Tor Krill
and PA Nilsson, the two founders who left Excito a while ago formed OpenProducts and were planning to take over support of the B3, but recently decided to cancel that takeover.

It is uncertain if Excito’s Bubba product line and its customized Debian distribution will survive. Open-sourcing their proprietary Debian packages would certainly be nice. I have tried to initiate a discussion on the Excito forums about this.

Sqoop daily Oracle data into Hive table partition

The following bash script can be used to import Oracle records into a Hive table, partitioned by date. It uses Sqoop. Both Hive and Sqoop are part of typical Hadoop distributions, like the Hortonworks Sandbox, for example.

#!/bin/sh

function upper() {
  echo "$1" | tr [a-z] [A-Z]
}

if [ $# -ge 5 ]; then
  schema=$(upper $1)
  table=$(upper $2)
  column_to_split_by=$(upper $3)
  date_column=$(upper $4)
  date_value="$5"
else 
  echo
  echo "Usage: $(basename $0) schema table column-to-split-by date-column YYYY-MM-DD"
  echo
  echo "Imports all records where value of date-column is \$date_value from"
  echo "Oracle table \$schema.\$table as a Hive table partition."
  echo "Hadoop will split the import job based on the column-to-split-by."
  echo "* The table must have the columns specified as column-to-split-by and date-column."
  echo "* The column-to-split-by must be finer granularity than date-column, ideally unique."
  echo "* The date_value must be in YYYY-MM-DD format."
  echo "* If date_value is unspecified, the current date will be used."
  exit 1
fi

echo "schema = $schema"
echo "table = $table"
echo "column_to_split_by = $column_to_split_by"
echo "date_column = $date_column"
echo "date_value = $date_value"

# we have to drop the partition, because --hive-overwrite does not seem to do it
hive -e "use $schema; alter table $table drop if exists partition($date_column='$date_value');"

columns=$( \
sqoop eval \
--options-file /usr/local/etc/sqoop-options.txt \
--query "select column_name from all_tab_columns where table_name = '$table'" \
| tr -d " |" \
| grep -Ev "\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-|COLUMN_NAME|$date_column" \
| tr '\n' ',' \
| sed -e 's/\,$//'
)

query="select $columns from $schema.$table \
       where $date_column = to_date('$date_value', 'YYYY-MM-DD') \
       and \$CONDITIONS"

echo "query = $query"

sqoop import \
--options-file "/usr/local/etc/sqoop-options.txt" \
--query "$query" \
--split-by "$column_to_split_by" \
--target-dir "$schema.$table" \
--hive-import \
--hive-overwrite \
--hive-table "$schema.$table" \
--hive-partition-key "$date_column" \
--hive-partition-value "$date_value" \
--outdir $HOME/java

JDBC connection details

Put them into /usr/local/etc/sqoop-options.txt, in a format like this:

--connect
jdbc:oracle:thin:@hostname:port:hostname
--username
oracle_username
--password
oracle_password

Make apps4halifax – Intro

Halifax Regional Municipality is introducing ‘apps4halifax‘, its first-ever Open Data App Contest. Similar initiatives have been successful in Ottawa, Edmonton and many other cities worldwide.

Residents can submit ideas or code apps using the HRM Open Data catalog. The best submissions may win cash prizes and awards.

The Open Data catalog is implemented using the Socrata Open Data Portal. The SODA 2.0 restful web service API allows developers to query and consume live data from the public data-sets.

The Socrata developer documentation explains how to use queries to endpoints, supported datatypes and response formats.

The currently available data-sets include Crime occurrences, Building types, Buildings, Bus Routes, Bus Stops, Bylaw Areas, Civic Addresses, Community Boundaries, Park Recreation Features, Parks, Polling Districts, Streets, Trails, Transit Times, Waste collections and Zoning Boundaries.

You can construct web service query URLs like this:

You can determine the RESOURCE-ID for a dataset like this:

  1. Go to https://www.halifaxopendata.ca/
  2. Click on a dataset name
  3. Click the “Export” button
  4. Under “Download As”, copy one of the links, e.g. JSON
  5. The resource id is the part of the URL between “views/” and “/rows.json”

As an extremely useful example, you could query fun things like all HRM garbage collections occurring on Wednesdays:

http://www.halifaxopendata.ca/resource/ga7p-4mik.json?collect=’WEDNESDAY’

As a simple and quick way to create web pages that interact with these web services you could use JQuery and its getJSON() function.

I will probably follow up with more posts on this topic soon.