Tuesday, January 27, 2009

Converting LDIF/LDAP data into a CSV file

LDIF to CSV

I work with LDAP on a regularly basis. I frequently have to pull data using ldapsearch. While the data that ldapsearch spits out is a decent representation sometimes I want something a bit easier to work with.

The format I use most tends to be CSV. While the acronym stands for comma separated value the format may be used to describe data that is printed out line by line and separated by anything you can imagine.

The problem for me is finding a decent LDIF to CSV converter. The ones that I have tried tend to choke on any number of issues. Some of these include binary data or failing to normalize the multivalued attributes.

I finally got sick enough of these issues that I decided to write my own. You can download it here:

Python/Linux/Source

Windows

Update: Pushed the code to GitHub: https://github.com/tachang/ldiftocsv

Hopefully the Windows binary works for you. If it doesn't then just download Python 2.6 and run the source manually. I am providing the binary just for convenience.

Using LDIFtoCSV
Running "python LDIFtoCSV.py" should give you the usage text.

usage: LDIFtoCSV.py [options]

-o <filename> : File to write output. By default this is set to sys.stdout
-l <filename> : File to write logging output. By default there is no logging.
-F <char> : Character to separate the fields by. By default this is a comma. i.e. -F","
-D <char> : Character to delimit the text value by. By default this is a double quote. i.e. -D"""
-M <num> : The maximum number of columns a multivalued attribute should take up (default: 5). This is common with the objectClass attribute where it can have over 20 values. Do you want to have 20 columns each with the same heading objectClass or do you want to limit it.

Here are some common command lines that I use (assuming you have a test.ldif):

Outputs the CSV straight to standard output:
python LDIFtoCSV.py test.ldif

Outputs CSV to standard output with semicolons as the delimiter:
python LDIFtoCSV.py -F";" test.ldif
Outputs CSV to standard output with pipes as the delimiter and text surrounded by carrots:
python LDIFtoCSV.py -F"|" -D"^" test.ldif

57 comments:

aabtzu said...

the website for the csv2ldif seems offline or non existent. did you move it somewhere else ?

Jeff Tchang said...

Should be back up. If you decide to try the tool out please let me know what you think!

Bryn said...

Very useful script - you've just saved me a lot of time. Thanks!

Mujtaba said...

Hello,

Thanks for this great program. Is it possible to include all multiple values for the same LDAP attribute as one csv entry separated by semicolons instead of setting a maximum number and limiting the amount of multiple values stored in the CSV?

Ie -
favorite - "Ice cream"
favorite - "Apples"

In the csv file:
..., "Ice cream";"Apples",....

Jeff Tchang said...

@Mujtaba - Right now there isn't but that is a great idea for the next version. Thanks!

Mujtaba said...

Any idea on when the next version will be out? I need this functionality for a project. :)

Jeff Tchang said...

@Mujtaba Right now I do not have any date for a new version. If I need the tool to do additional functionality then I will release a new version. Of course you are free to download the source and modify it to fit your needs. If you are really in a need of customizations feel free to e-mail me directly and we can discuss a nominal rate.

Mujtaba said...

Thank you, I will modify it myself.

register said...

I have addressed similar problems in C++. My solution is simplier but could fit the needs of someone who adopts C++ as the language of choice. Here is the link: http://register72.wordpress.com/2009/07/09/ldif-to-tsv-conversion/

Kenneth said...

This is wonderful, exactly what I need.

Is the .exe and/or source redistributable?

Jeff Tchang said...

@Kenneth

Not sure about the executable (since there might be some Microsoft DLL stuff in there) but feel free to use the source and downloading a python interpreter.

Ramiz said...

This is brilliant. Just as you experienced; I could not get any softwarew that coudl do the conversion well but this has save me a lot of hard work.

I hope to see future enhancements; as this is a tool for the future for all that use the LDAP platform.

gprellwitz said...

We used it to convert our 3000 employee LDIF to a spreadsheet, and it worked like a charm. Thanks! Only thing that would make it better would be the ability to either make a connection directly to the LDAP or to accept the input file from stdin so we could pipe the ldapsearch results to it! For a one-time use, though, it's great as is.

Julian said...

Thanks a lot, just converted a LDIF-file containing 72000 entries. It worked well for me.

Todd said...

Excellent script!

Thank you very much for sharing it.

MGT 5050 Brian G said...

Thank you, Thank you, Thank you!

Worked perfect the first try!

DJ Alik said...

Awesome tool!!! Works like a charm.

Akshay Kulkarni said...

This is perfect. I found it after a long struggle with various other tools..

Thanks much!

Zhang Weiwu said...

I am also fed up with all tools that tries to convert ldif to csv, but having tried this one, it suffer from the same two problems of every other tool I used. Both two problems are because most such tools are written by people from the west with little care on ideographs.

First, your tool like other don't distinguis ideograph string from binary pictures. It is difficult to tell ideographs from pictures, despite different attribute type, they all look like binaries on the command line pipe. The application that can correctly write out ideograph to csv/tsv must either allow user to configure which field is a string and which is binary, or, even better, be schema-aware. To be schema-aware the application must act as ldap client, instead of reading from ldif file, because when exported to ldif file the schema is already lost.

Second, it should offer the possibility to only convert one language version to csv/tsv, or offer user to consider multiple language versions as multiple values. Language extension is not much used in the west. Consider this:

line100: cn: Zhang Weiwu
line101: cn;lang-zh: 張韡武

Line 101 is the Chinese spelling of the same name.

It is not possible, as far as I know, to produce ldif with ldapsearch so that only include line100 but not line101, thus tools must be careful when handling such data source.

I know it is difficult to make tools that doesn't suck for all users, but if you can manage that, it is the kind of achievement that many others before you never even manage to get close to.

Jeff Tchang said...

@Zhang Out of curiosity how does the program fail?

Zhang Weiwu said...
This comment has been removed by the author.
Zhang Weiwu said...

Behavior of your system:

# the blog doesn't accept HTML pre tag. From now on is start of pre
$ ldapsearch_intranet cn=Yu*Liansu givenName sn cn title jobtitle organization organizationalUnit postalcode country st locality street telephonenumber fax labeledURI description | tee /tmp/test.ldif
# extended LDIF
#
# LDAPv3
# base with scope subtree
# filter: cn=Yu*Liansu
# requesting: givenName sn cn title jobtitle organization organizationalUnit postalcode country st locality street telephonenumber fax labeledURI description
#

# yuliansu, contacts, realss.com, eoa.cn
dn: uid=yuliansu,ou=contacts,ou=realss.com,dc=eoa,dc=cn
description: Assistant, started to work in mid-2005
sn:: 5L+e
sn;lang-en: yu
givenName:: 6I6y6IuP
cn:: 5L+e6I6y6IuP
cn;lang-en: yuliansu
givenName;lang-en: liansu
userPassword:: e21kNX1RdmdhTytSZkw5NTB0bUNDM09JamtBPT0=
telephoneNumber: 010 69020190
st:: 5YyX5Lqs
st;lang-en: beijing
street:: 5rW35reA5Yy65LiH5p+z5LiH5rOJ5paw5pawMTHmoIsz6ZeoMTAx5a6k
street;lang-en: haidianquwanliuwanquanxinxin11dong3men101shi
l:: 5YyX5Lqs5biC
l;lang-en: beijingshi
postalCode: 100089
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: evolutionPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: phpgwAccount
objectClass: qmailUser

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1

$ python /tmp/LDIFtoCSV.py /tmp/test.ldif
"cn","cn;lang-en","description","dn","givenName","givenName;lang-en","l","l;lang-en","objectClass","objectClass","objectClass","objectClass","objectClass","postalCode","result","search","sn","sn;lang-en","st","st;lang-en","street","street;lang-en","telephoneNumber","userPassword",
"'\xe4\xbf\x9e\xe8\x8e\xb2\xe8\x8b\x8f'","yuliansu","Assistant, started to work in mid-2005","uid=yuliansu,ou=contacts,ou=realss.com,dc=eoa,dc=cn","'\xe8\x8e\xb2\xe8\x8b\x8f'","liansu","'\xe5\x8c\x97\xe4\xba\xac\xe5\xb8\x82'","beijingshi","top","person","organizationalPerson","evolutionPerson","posixAccount","100089","","","'\xe4\xbf\x9e'","yu","'\xe5\x8c\x97\xe4\xba\xac'","beijing","'\xe6\xb5\xb7\xe6\xb7\x80\xe5\x8c\xba\xe4\xb8\x87\xe6\x9f\xb3\xe4\xb8\x87\xe6\xb3\x89\xe6\x96\xb0\xe6\x96\xb011\xe6\xa0\x8b3\xe9\x97\xa8101\xe5\xae\xa4'","haidianquwanliuwanquanxinxin11dong3men101shi","010 69020190","{md5}QvgaO+RfL950tmCC3OIjkA==",
"","","","None","","","","","","","","","","","0 Success","2","","","","","","","","",
# end pre tag here

If problem 1 solved, all appearance of things like \x9f\xb should be ideograhs like 莲.

If problem 2 solved, there should be ways to omit every column whose field name is "*;lang-en".

It's great you want to look at this issue. As an asian user I cannot help feeling this:

the world is not better because there are 10+ tools, it would be better if there is only one tool that does things well.

And often I read "I decide to make my own tool" on the web is a creation again ignoring Asian text, not only ldap tools but all tools. You cannot help being critical after all such experiences, even given all tools are granted as free in freedom without charge:)

lala said...

Thanks for your post and welcome to check: here.

Jason said...

The download appears not to be working again. Can you check it?

netarc said...

The download links don't appear to be working, fyi. BTW, is your tool the same as the "LDIF2CSV" one on softpedia, wondering if you published it there?

ganesh said...

the download isn't working?? do you know of a way where i can convert multiple ldif files 19000 to be precise into csv and load it onto a DB? if anyone has any answers please lemme knw..

Jeff Tchang said...

I just updated the links. I will be putting it on github shortly.

'Spark said...

Thanks very much for your useful tool. If you ever have time to implement Mujtaba's suggestion (collect multi-values into one delimited string), it would be extremely beneficial for my use-case. Cheers :)

Jason Lockwood said...

Thanks for the very useful tool.

Dheeraj said...

nice post very well written. i appreciate quality of writing u have SEO Company India

Aaliyah sam said...

I would like to thank you for the efforts you have made in writing this post. I am hoping

the same best work from you in the future as well.

Frases de Amor

frankleo said...

Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit
Plumbing Greenville

jimmy said...

Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

Houston Electrician

vikas kukreja said...

Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

Electrician in Houston

smash john said...

I would like to thank you for the efforts you have made in writing this post. I am hoping

the same best work from you in the future as well.

Reliable VPN Proxy Service

nash said...

Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

Bakersfield Web Design

josef den said...

I would like to thank you for the efforts you have made in writing this post. I am hoping

the same best work from you in the future as well.

windows vps server

harrywatson said...

Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

cheap voip call

john ibrahim said...

I would like to thank you for the efforts you have made in writing this post. I am hoping

the same best work from you in the future as well.

compra venta

geetika madan said...

Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

segunda mano

micheal j said...

I would like to thank you for the efforts you have made in writing this post. I am hoping

the same best work from you in the future as well.

electrician in phoenix

immy shane016 said...

I would like to thank you for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well.

website design Birmingham

geet said...

Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

vpn liberty reserve

charlie smash said...

I would like to thank you for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well.

Escorts San Diego

charlie smash said...

I would like to thank you for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well.

Escorts San Diego

craig pattinson said...

Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

used cnc router

geet said...

Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit
Electrician Greenville SC

harrywatson said...

Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit
Greenville air conditioning

Alessandro Mazzi said...

Fantastic, it helped me to migrate contacts in LDIF format into GMail!!!

Alessandro Mazzi said...
This comment has been removed by the author.
Ankur Dharek said...

Hello Jeff, sorry to contact you so randomly.. I was seeking some help from online and I believe you might be able to help me. I am currently working on a project where I have to migrate a table from DB2 to LDIF and import into IBM LDAP. Would you please able to share some knowledge on this?


Thanks
Ankur

Gurpreet si said...

nice stuff here to read thanks earn online

Arjen van Bochoven said...

I think I fixed the utf character problem:

In ldif.py, I added .encode('utf8') to line 325:

attr_value = base64.decodestring(unfolded_line[colon_pos+2:]).decode('utf8')

in LDIFtoCSV.py I changed line 128 to 139 to

if( attributeName in entry ):
i = 0
while( i < numberOfTimesToPrint ):

if( i < len(entry[attributeName])):
self.defaultOutput.write(self.textDelimiter + entry[attributeName][i].encode('utf8') + self.textDelimiter + self.fieldSeparatorCharacter)
else:
self.defaultOutput.write(self.textDelimiter + self.textDelimiter + self.fieldSeparatorCharacter)

i = i + 1

# If the attribute name is dn, print the fully qualified distinguished name

Now this ldif:

dn: CN=MACHINE,OU=Domain Controllers,DC=our,DC=domain
name: MACHINE
operatingSystem:: V2luZG93cyBTZXJ2ZXLCriAyMDA4IFN0YW5kYXJk

Converts correctly to:

Windows Server® 2008 Standard

Arjen van Bochoven said...

I meant

In ldif.py, I added .decode('utf8') to line 325:

attr_value = base64.decodestring(unfolded_line[colon_pos+2:]).decode('utf8')

Sushmi said...

Hi,

I am loooking fro CSV to Ldif conversion,and this is very urgent for me...I see so many tools online and getting confused.
Can you pl. advise?

Fernando Ruza Rodriguez said...

Perfect !! - Cojonuda !!

It does exactly what I wanted

LDAPWork said...

Great Work, Thanks for providing this Utility, windows utility worked great for me.