Recently I noticed a tweet questioning if capture software should run in the cloud. I expected that the answer contains more nuance than would fit in (two times) 140 characters, but I gave it a try anyway.
Chuck Romano‘s initial blog was triggered by his ‘discovery’ of Ephesoft. I ended up in a twiscussion, and Chuck (Global360, Open Text these days) announced this (series of?) blogs. I know the ‘solutions’ described in this blog are (over) simplified, but this doesn’t harm my point. It does keep this blog at a’decent’ size.
There are 2 main tracks in this discussion:
1) One can argue using identical arguments for any application using data in the cloud. The question is more complex than just ‘Capture in the cloud’. The same is true for DMS in the cloud, or CRM, actually for any business application running ‘anywhere’.
2) Ephesoft uses ‘cloud’ in its marketing language. But this does not mean it will run in the cloud only. In my opinion it is more of a technology reference, a differentiator, it characterizes their software. Ephesoft as 100% server based application can run in the cloud.
What is ‘the cloud’
‘The cloud’ is quite an undefined buzzword. For this blog I would like to define it as “processing and/or storing content somewhere outside the organizations network” or in short ‘stuff’ (content/data and/or it’s processing) is ‘somewhere else’. A third party is made responsible for providing and/or maintainting one or more functionalities like infrastructure, Operating Systems, applications and/or storage of data and content. Examples are Salesforce, DropBox, GoogleApps, but also having a server at Rackspace or Amazon running our Alfresco Share instance.
1) Having or processing your stuff in the cloud
This has several aspects. I will touch only a few briefly, although you can spend quite some blogs on just this topic.
This topic has quite some ways of looking at it. One of them is trust in the 3rd party managing your technology. There is the application itself, and the transport from ‘somewhere’ to wherever you are located.
It is a valid question if confidential or privacy related content shoud be stored or processed outside a company firewall. Medical records, personal information, sensitive business information; do you trust the chain of applications, operation, maintenance, updates, external personell and the network? Are external webservers and firewalls configured correctly? Is the application hardened? Are security patches tested and put in action? Do the 3rd party employees take USB sticks with your sensitive information home now and then?
The problem is that you don’t know, and can’t control. (And if you host the application internally, you probably cannot either.)
Is the application dedicated for your organization? Is the server dedicated for your organization or application? If this is about a private system, you can probably put sensitive information more easy into the system, since there is more control. But what if you share the server or application? Like Salesforce. Is the system multi-tenant? (Is the database split up for each organization using the system) Is that a must? Is that a guarentee? Is separation by an application’s internal access policies good enough or equally secure? Is there any guarentee one organization cannot acess another organizations information? And what if there is an ‘unintended human configuration error’? Or a bug? Questions I do not have the answers to. It will depend on your organizations policies and risk and damage estimations. Personally I believe the type of information to store or process would influence the feeling I have with certain questions/answers.
‘Stuff’ is ‘somewhere else’ doesn’t say anyhing about how you can control your own stuff somewhere else. Let’s focus on Infrastructure (‘somewhere else’) and forget the Application (‘how’) for now.
Since the ‘stuff’ is somewhere else, how to get from ‘here’ to ‘somewhere else’ in a safe manner. In the past you did not send confidential paper in an open envelope around the world. Digitally you should not either. (Personally I don’t understand why almost all email (plain text right?) is sent unencrypted, but that’s off-topic for now.) Since your application (let’s assume a server based web application like Ephesoft) is running remote, you do want to use some secure channel to get the UI (web pages?), images and files over to your desktop or mobile. I think there are two distinct scenario’s;
- The remote service is a shared service, not a dedicated one (think Salesforce, Google Apps)
Any hosted and shared application that is offered ‘as a service’ should communicate using some SSL technology widely available (https, various secure ftp flavours, CMIS over https). This way each individual session is secured the same way you trust yourself communicating with your bank.
- The remote service is dedicated (like our private Alfresco instance)
Although the technology as in 1) is valid, you can also consider creating a VPN tunnel from your internal network to the remote server. This way all protocols can benifit from a secure tunnel, and this tunnel can be managed from the IT department.
Considering technology, it should not be a problem to have your ‘stuff’ running or stored ‘somewhere else’. The funny thing is, you can use widely available, existing and proven technology, actually very similar technology to that you use to build your multi-site LAN/WAN.
The ‘weak spot’ of the cloud is that it happens ‘online’. If your network is untrustfull, you have a challenge. If the ISP has issues, you have no connection. This can be a serious thread, if your business is down if the connection to your cloud application is down. You can create redundancy by having your Internet connection in double (physically?!) using multiple providers and routes to the cloud application (which might be redundant or have failover network access too).
2) Ephesoft in the cloud
ephesoft uses ‘cloud’ in their marketing language. The software Ephesoft is designed and the company is run by very knowledgable and experienced ex-Kofax guys. They used their lessons-learned, and created another capture application. Like any software application, it is doesn’t fit all. It has major benifits. To mention a few:
- Ephesoft is a server based web application, operation and administration using a webbrowser only
- based on proven open source components and open standards
- based on existing and proven internet protocols
- easy scalable in throughput, failover
- technically supporting ‘infinite’ number of clients anywhere.
- Zero install client (client = webbrowser and still key-control for top-speed!)
- bulk scanning and distributed scanning sharing able to share the same capture process(es) (think about all your local desktop scanners and multifunctionals)
- central processing
- export of scans and captured metadata to file, CMIS or custom export
- some brillant features to capture metadata from scanned documents (capture against a table of your expected invoices?)
- ‘capture as a service’-like features,
- batches are assigned to groups
- groups only know about their own capture-batches
- commercial open source
- simple and predictable cost structure (based on CPU’s, not on scans, documents, languages, users)
Since Ephesoft is build as a scalable web application, all standard web technology can be applied. Yes, it can also easily run and scale in the cloud. But many organizations I know about run it within their own WAN/LAN. As said before, to me, ‘Cloud’ in Ephesoft’s marketing points at the technology, product characteristics and possibilities. It is certainly not a ‘must,’ nor a ‘should’. And yes, there are organizations investigating ‘capture as a service’-like concepts, to offer to their customers. And that is possible too.
Some questions to think about:
- What is the conceptual difference in requirements between an organization having multiple departments using capture, but each of them processing sensitive information, and a saas-like approach?
- What is the technical difference between having capture in a multi site organization, or in the cloud?
- What is the difference in risk if an organizations (multi site?) ‘internal’ network is down, or Internet is not available?
Cloud (technology) can be a cost saver. Cloud also has some disadvantages. If you don’t trust the chain from your desktop or tablet to your ‘stuff’ stored/running ‘somewhere’, don’t go there. If your content is confidential, sensitive or legislation forbids, don’t go there. This is of course common sense for any application outside your firewall an organization uses. Especially considering a private server in te cloud, there is not that much difference difference between cloud or internally deployed applications.
Ephesoft uses ‘the cloud’ in their marketing as an advantage over their competition. Their ‘cloud’ is not a must, it is a pointer to technology and possibilities. Ephesoft is often installed on-premise. And yes, also in the cloud. The question is; can my ‘stuff’ be captured and processed ‘somewhere else’. This is all about the type of content, policies, legislation, trust. Not about technology. Not about Ephesoft.
Ephesoft (like Alfresco) is as flexible in the cloud as in your own network. It can be used from anywhere (if you allow it to) and can be used on your web browser or device of choice (if you allow it to). Ephesoft offers the characteristics, flexibility and possibility to do so. Considering your conent and purpose, it is your choice what you feel comfortable with.
[update 7 nov 2011; re-added the ‘more’ link]