So, even if the object has accessor methods to control how the object's attributes are manipulated:
$obj->set_name("ob1"); print $obj->get_name();it's still possible to access the data directly:
$obj->{_name} = "ob1"; print $obj->{_name};But if the get_name and set_name methods do anything other than simply retrieve and set the underlying hash entry—for example, checking the assigned value's validity, or logging retrievals—then directly accessing the data in this way may introduce subtle bugs into the program.
In practice, this lack of a built-in encapsulation mechanism rarely seems to be a problem in Perl. Most object-oriented Perl programmers use hashes as the basis of their objects, and get by quite happily with the principle of "encapsulation by good manners". The lack of protection for attribute values doesn't matter because users of a class either respect the official interface of its objects (i.e. their methods), or they're smart enough to get away with poking around inside an object without breaking anything.
The only problem is that this culturally enforced encapsulation doesn't scale very well. It's fine for a few hundred lines of code written by a single programmer, but is less successful when the code is tens of thousands of lines long and developed by a group of people. Even if the entire team can be trusted to maintain sufficient programming discipline and to consistently respect the notional encapsulation of attributes (a dubious proposition), accidents and mistakes will happen. Especially in rarely used parts of the system.
Moreover, deliberate decisions to circumvent the conventions of encapsulation are rarely documented adequately, leading to problems much later in the development cycle. For example, consider a notionally "private" attribute of an object, which for efficiency reasons is accessed directly in an obscure part of a large system. If the implementation of the object's class changes, that attribute may cease to exist. In a more static language, this would cause an error message to be generated when next some external code attempts to access the (now non-existent) attribute. However, Perl's autovivification of hash entries will silently "recreate" the former attribute whenever it's accessed. The direct access operation proceeds, but now it retrieves or modifies a "phantom" attribute. Bugs such as this can be painfully difficult to diagnose and track down, especially if the original programmer has moved on by the time the problem is discovered.
In this approach, the class's constructor creates a lexical
hash (say, my %data) and initializes it with
the appropriate attribute values. It then creates a new anonymous subroutine
that acts as a closure, preserving access to the lexical %data
variable, even after the constructor finishes. Finally the constructor
blesses the anonymous subroutine into the class and returns a reference
to it. In other words, each object of the class is a subroutine that is
preserving an otherwise inaccessible lexical variable. Figure 1 illustrates
the arrangement.
The key to making this unusual set-up work is the behaviour of the anonymous subroutine. Typically, it takes up to two arguments: a string indicating which attribute is to be accessed, and an optional value to be assigned to that attribute. The subroutine then analyses the arguments and determines whether the access is permitted.
For example, if no second argument is specified, the anonymous subroutine might just return the value of the attribute specified by the first argument. If a second argument is specified (indicating a "set" operation), the subroutine might check whether the attribute is modifiable, perhaps by consulting another lexical hash (say, %public). If the attribute is not universally accessible, then the subroutine might check whether the request came from the right package (i.e. see who caller is) before proceeding.
Once that functionality is in place, each of the object's accessor methods just call the corresponding anonymous subroutine. The accessor passes the subroutine its own name, and any "new value" argument, whereupon the subroutine decides what to do. Every accessor is therefore structurally identical, so it's easiest to implement them all using a single AUTOLOAD method.
Putting all those components together produces a class declaration like this:
package Data; # Personal data $VERSION = 1.00; my %public = # Access info ( name=>1, age=>1, phone=>1 );
sub new { my ($class, %data) = @_; my $self = sub { my ($attr, $newval) = @_; # Enforce the encapsulation die "no such attribute: $attr" unless exists $data{$attr}; die "inaccessible: $attr" unless $public{$attr} || caller eq $class; # Provide the access $data{$attr}=$newval if @_ > 1; return $data{$attr}; }; bless $self, $class; } sub AUTOLOAD { $AUTOLOAD =~ s/.*:://; return shift()->($AUTOLOAD,@_); }The result is that each Data object is a blessed subroutine, and has the only remaining access to the lexical %data. It uses that hash as its own private storage area, getting or setting entries in %data.
The next time Data::new is invoked, a new (and entirely distinct) lexical hash—also called %data—will be created within the constructor. Then a new (and entirely distinct) anonymous subroutine will be created, blessed, and returned. That subroutine will subsequently have the only access to the new %data hash. In this way, every call to Data::new creates a hash that's "guarded" by its own personal subroutine.
Oddly enough, the resulting encapsulation is far stronger than that provided by most other object-oriented languages. Not even the methods of its own class have direct access to an object's data. Instead, they must request access via the encapsulating subroutine.
The various accessors for the class use the index stored in an object to access the corresponding element in the @data array, where the object's attributes are actually stored. But @data is a lexical variable and is declared inside a block, so only those subroutines that were also defined in the same block have access to it. And, of course, the only subroutines that will be defined in the block are the constructor and generic accessor (i.e. AUTOLOAD) for the class.
That necessary code looks like this:
class Data; $VERSION = 2.00; { my %public = ( name=>1, age=>1, phone=>1 ); my @data; sub new { my ($class, %data) = @_; # Add new hash to secret array push @data, \%data; # Determine the hash's index # and bless it as the object my $index = $#data; bless \$index, $class; } sub AUTOLOAD { my ($self, $newval) = @_; # Determine attribute name $AUTOLOAD =~ /.*::(.*)/; my $attr = $1; # Determine the index where the # object's attrs are stored my $index = ${$self}; # Enforce the encapsulation die "no such attribute" unless exists $data[$index]{$attr}; die "non-public attribute" unless $public{$attr}; # Provide the access $data[$index]{$attr} = $newval if @_ > 1; return $data[$index]{$attr}; } }So even though the users of the class have the keys (i.e. the index stored in each blessed scalar), lexical scope prevents them from reaching the lock (i.e. @data). This provides the desired encapsulation.
In contrast, the encapsulation techniques described above are inherently "all-or-nothing" propositions. Every attribute is completely encapsulated from the rest of the program. In C++/Java terms, they're all "private"; in Eiffel terms, none of them is "exported". It's up to the accessor subroutines to provide the necessary logic (i.e. die unless $public{$attr} || caller eq $class) to grant different levels of access. And, of course, this logic has to be manually coded in each encapsulating closure.
A more significant drawback is that both techniques are moderately hard to understand and to code correctly—particularly by beginners, who probably benefit most from proper encapsulation. Both techniques are based on the closure properties of Perl subroutines, which are not well understood by many programmers. Both are most efficiently implemented using relatively obscure code, which reduces the maintainability of the resulting classes.
All in all, the costs of building encapsulated classes seem to outweigh the benefits. It's hardly surprising that, as elegant as they are, such classes are used so rarely. What's really needed is a mechanism that will allow objects to be implemented in the usual way (i.e. by blessing hashes) and yet enable the implementer to designate some of the attributes of the resulting objects as "protected" or "private".
A Tie::SecureHash object (or securehash) can be created by explicitly tie'ing an existing hash:
my %securehash; tie %securehash, Tie::SecureHash;or by calling the module's constructor method:
my $securehash_ref = Tie::SecureHash->new();The constructor version returns a reference to an anonymous hash that has been tied to the Tie::SecureHash package, and which has also been blessed into the Tie::SecureHash class.
Either way, a securehash acts like a regular hash, and provides:
Securehashes also support deletion of individual entries and direct assignment, with some limitations.
sub MyClass::new { my $class = ref($_[0]) || $_[0]; tie my %hash, Tie::SecureHash; my $self = bless \%hash, $class; # initialization of attrs here return $self; }Because securehashes are principally intended as object implementations, the Tie::SecureHash module makes process easier by providing the method Tie::SecureHash::new. When called with a single argument, this method creates a new securehash (i.e. ties an ordinary anonymous hash to the Tie::SecureHash package) and then blesses it into the class named by argument (or into the same class as the argument, if it's an object reference). That simplifies MyClass::new to this:
sub MyClass::new { my $self = Tie::SecureHash->new($_[0]); # initialization of attrs here return $self; }
This may seem inconvenient at first, but it actually saves an inordinate amount of time and effort tracking down "spelling bugs" like this:
package Disk::Recovery; sub new { my ($class, @files) = @_; bless { _retrieved => [ @files ], _attempts => 0, _wierd_data => undef, }, $class; } sub report { print "Made $self->{_attempts} attempts to recover:\n"; print "\t$_\n" foreach (@{self->{retreived}}) print "Failed (weird data)\n" if $self->{_weird_data}; }Unlike the regular hash in the above example, the entries of a securehash can't be accessed until they've been "created". A specific entry is created by referring to it using a qualified key, which is a key string consisting of any characters except ':', preceded by a standard Perl package qualifier. Table 1 illustrates some typical qualified keys.
Qualified key | Key | Qualifier |
'Class::key' | 'key' | 'Class::' |
'Class::a key' | 'a key', | 'Class::' |
'My::CD::_tracks' | '_tracks' | 'My::CD::' |
'Railway::_tracks' | '_tracks' | 'Railway::' |
'Crypt::__passwd' | '__passwd' | 'Crypt::' |
'main::key_berm' | 'key_berm' | 'main::' |
'::key_berm' | 'key_berm' | 'main::' |
Each qualifier indicates the package that "owns" the key. Hence, the first two keys above are owned by class Class and the last two by the main package.
Qualified keys that have the same key but different qualifiers (for example, 'Railway::_tracks' and 'My::CD::_tracks') are treated as being distinct, even if they label two entries in the same securehash.
Typically, entries in a securehash are created by referring to their fully-qualified names at some point in a class's constructor:
sub MyClass::new { my $self = Tie::SecureHash->new($_[0]); $self->{MyClass::attr1} = $_[1]; $self->{MyClass::_attr2} = $_[2]; $self->{MyClass::__attr3}= $_[3]; return $self; }In this case, the entries with the keys "attr1", "_attr2", and "__attr3" are all "owned" by the class MyClass. For reasons that will be made clear in the next section, an entry must be declared within its owner's package. In practice, that means that the qualifier for any entry declaration will always be the name of the current package, as in the example above.
Key qualifiers are only required during the creation of entries (and occasionally to resolve ambiguities). After the declarations, they can usually be ignored:
sub MyClass::set_attr2 { my ($self, $newval) = @_; $self->{_attr2} = $newval if @_>1; }though using the fully qualified key is always acceptable:
sub MyClass::set_attr2 { my ($self, $newval) = @_; $self->{MyClass::_attr2} = $newval if @_>1; }
For example, the constructor for MyClass could also be written like this:
sub MyClass::new { my $self = Tie::SecureHash->new($_[0], attr1 => $_[1], _attr2 => $_[2], __attr3 => $_[3], ); }This is the only way that entries can be declared without an explicit qualifier.
Tie::SecureHash treats keys that begin with two (or more) underscores even more carefully. The entries for such keys are only accessible from code in their owner's package and in the same file as they were originally declared. In other words, a double underscored key indicates a "private" and "pseudo-lexical" key.
The only other possibility is a key with no leading underscore. Predictably, no underscore indicates that an entry is "public" and universally accessible.
This is reasonably consistent with existing Perl conventions about key naming, but the important difference is that securehashes enforce the convention at run-time. If a doubly-underscored key is accessed outside its owner's package or its declaration file, an exception is immediately thrown. The same thing happens if a singly-underscored key is accessed outside its native class hierarchy. For example:
package Derived::Class; @ISA = qw( MyClass ); sub dump { my ($self) = @_; print $self->{attr1}; # okay print $self->{_attr2}; # okay print $self->{__attr3}; # error }The first print is okay because the lack of a leading underscore indicates that 'attr1' is a public attribute, accessible from any package. The second print is okay too because the single leading underscore indicates that '_attr2' is a protected attribute, accessible for any package in Class's hierarchy. But the last print tries to access an attribute with two leading underscores, causing the exception:
Private key 'MyClass::__attr3' of tied SecureHash is inaccessible from package Derived::Class.Likewise, an access attempt such as:
package main; my $obj = MyClass->new(); print $obj->{_attr2};would die with the message:
Protected key 'MyClass::_attr2' of tied SecureHash is inaccessible from package main(unless main inherits from MyClass, of course).
Access constraints also apply to the functions each, keys, values, and delete, when applied to securehashes. A key will only be iterated, listed, or deleted if it is accessible at the point where the operation is invoked.
This also has implications for direct assignment to a securehash. A statement such as:
%securehash = ();is equivalent to a series of delete operations, and hence will only succeed if every key in the securehash is accessible from that point. If any key is inaccessible, an exception will be thrown (and the securehash will be unchanged).
Another difficulty with reassigning a securehash is that every new key being assigned must be appropriately qualified with the name of the current package. In other words, the standard securehash entry declaration rules still apply. For example:
package SomeClass; %securehash = ( attr1 => $val1, attr2 => $val2, );will throw an exception because the keys 'attr1' and 'attr2' don't exist in the newly-cleared %securehash. To successfully reinitialize the securehash, each new key requires a fully qualified name:
package SomeClass; %securehash = ( SomeClass::attr1 => $val1, SomeClass::attr2 => $val2, );
The convenience aspect is obvious. Requiring that securehash keys always be fully qualified would flout the cardinal virtue of Laziness. No-one would want to use a securehash if they always had to write $self->{MyClass::__attr3}, instead of just $self->{__attr3}. In most cases, each attribute of an object will be uniquely named, so each securehash will contain only a single matching unqualified key. The qualifier would be redundant and annoying.
Inheritance, however, brings a difficulty known as the "data inheritance problem"[5]. When one class inherits from another, it's all too easy to accidentally reuse the name of a base class attribute in a derived class. For example:
package Settable; $VERSION = 1.00; #uses normal hashes sub new { my ($class, $is_set) = @_; bless my $self = {_set => $is_set}, $class; } sub set { my ($self) = @_; # access Settable's _set attr $self->{_set} = 1; } package Set; @ISA = qw( Settable ); sub new { my ($class, %items) = @_; my $self = $class->SUPER::new(); $self->{_set} = { %items } # Oops! } sub list { my ($self) = @_; print keys %{$self->{_set}}; # Err...was that Set's '_set' # or Settable's '_set'? }The problem is both Settable and Set want to use a '_set' entry, but Set objects have to share the same hash as their Settable base parts, and hence there can be only one such entry.
The use of qualified keys in a securehash solves the problem (in fact, it's the same solution as suggested in Perl Cookbook):
package Settable; $VERSION = 2.00; #uses securehashes sub new { my ($class, $set) = @_; my $self = Tie::SecureHash->new($class); $self->{Settable::_set} = $set; return $self; } sub set { my ($self) = @_; $self->{Settable::_set} = 1; # Definitely Settable's _set } package Set; @ISA = qw( Settable ); sub new { my ($class, %items) = @_; my $self = $class->SUPER::new(); $self->{Set::_set} = { %items }; # Different key so no "collision" } sub list { my ($self) = @_; print keys %{$self->{Set::_set}}; # Definitely Set's _set }But securehashes are even smarter than that. Any qualifier/key combination that is unique creates an entry whose unqualified key is unique within its owner's namespace. So it's also possible to write:
package Settable; $VERSION = 3.00; #uses securehashes sub new { my ($class, $set) = @_; my $self = Tie::SecureHash->new($class); $self->{Settable::_set} = $set; return $self; } sub set { my ($self) = @_; $self->{_set} = 1; # Definitely Settable's '_set' (!) } package Set; @ISA = qw( Settable ); sub new { my ($class, %items) = @_; my $self = $class->SUPER::new(); $self->{Set::_set} = { %items }; # Different key so no "collision" } sub list { my ($self) = @_; print keys %{$self->{_set}}; # Definitely Set's _set (!) }The unqualified keys are unambiguous because the Tie::SecureHash module keeps track of where an access was requested, and works out which key was intended from that context. When the Set::list accesses the '_set' key, it probably wants the entry for 'Set::_set', not 'Settable::_set'. The securehash is aware of the context of the access and returns the correct attribute.
Another way of looking at it is to think of securehash entries that are defined in a base class as being "hidden" by derived class entries of the same name (just like inherited attributes are in most other object-oriented languages). Of course, if the inherited entry is needed in a derived class method, it can still be accessed by fully qualifying it:
sub Set::list { my ($self) = @_; print keys %{$self->{_set}} if $self->{Settable::_set}; }That's not to say that a securehash can always correctly guess the intended entry for an unqualified key. Consider the following two classes:
package Chemical; sub new { my ($class, $chemname) = @_; Tie::SecureHash->new($class, name => $chemname); } package Medicine; @ISA = qw( Chemical ); sub new { my ($class, $medname, $chemname) = @_; my $self = Chemical->new($class, $chemname); $self->{Medicine::name} = $medname; return $self; }Within the Chemical class, the unqualified public key 'name' will always be assumed to be referring to 'Chemical::name'. Similarly, inside any of Medicine's methods the same key is unambiguously resolved to 'Medicine::name'. But what about accesses from the main package? For example:
package main; my $medicine = Medicine->new("Dydroxifen","dihydrogen oxide"); print $medicine->{name};Since the 'name' entry isn't being accessed from a method of either class, there's no way to decide which entry was intended. Tie::SecureHash resolves the ambiguity by immediately throwing an exception.
The solution is to explicitly qualify any ambiguous case:
print $medicine->{Medicine::name};Problems of a similar type occur with protected keys as well, whenever a class inherits from two or more classes. If both classes use a protected attribute of the same name then, in a class than derives from both, it's impossible to tell which inherited attribute was intended:
package Dessert::Topping; sub new { Tie::SecureHash->new($_[0], _shaken => 0) } sub shake { $_[0]->{_shaken} = 1 } package Floor::Wax; sub new { Tie::SecureHash->new($_[0], _shaken => 0 ) } sub shake { $_[0]->{_shaken}++ } package Jiffy::Whip; @ISA = qw(Dessert::Topping Floor::Wax); sub shaken { $_[0]->{_shaken} } # Dessert::Topping's '_shaken' # orFloor::Wax's '_shaken'?Once again, since it can't decide which of the two attributes was intended, Tie::SecureHash simply throws an exception.
sub Jiffy::Whip::shaken { my ($self) = @_; $self->Tie::SecureHash::debug(); # Find the source... return $self->{_shaken}; # ...of this problem: }Tie::SecureHash::debug reports the current location details (package, file, line and subroutine) and the key and value of each entry of the securehash, categorized by owner. More importantly, debug reports the accessibility of each entry at the point where it was called (either "accessible", "inaccessible", or "ambiguous") and explains why.
Fortunately, production code doesn't actually need the security of encapsulation. That's because all that checking of access restrictions is only actually required when a piece of code incorrectly attempts to violate those restrictions. Since production code is always thoroughly tested (ahem!), such bugs will have been caught and eliminated, so the checks are redundant. In other words, if no one can ever break the law, you no longer need any police to enforce it.
Thus, the solution is to develop the application using Tie::SecureHash to enforce proper encapsulation, test it thoroughly to ensure that there are no improper accesses anywhere in the code, and then optimize the final code by converting every securehash to a normal hash.
Because a securehash's interface mimics the interface of a regular hash, converting from securehashes to the regular kind is surprisingly easy. It's not necessary to change any of the code that accesses a securehash, only the code that creates it. In fact, that's exactly what encapsulation is all about: hiding implementation details behind a standard interface so that client code doesn't have to worry when those details change.
Of course, in the typical large application where encapsulation is most useful, hunting for every situation where a securehash is created and then replacing it with a regular hash could still be time-consuming and error-prone. Fortunately, even that isn't necessary.
Tie::Securehash provides a special "fast" mode, in which a call to Tie::SecureHash::new returns a reference to an ordinary hash, rather than to a securehash. Hence, in "fast" mode, there's no need to replace any code like:
$self = Tie::SecureHash->new($_[0]);because it correctly adjusts its behaviour automatically.
Of course, that doesn't solve the problem of any "raw" tie-ing:
tie %$self, Tie::SecureHash;but that's just another reason to use Tie::SecureHash::new instead. Indeed, in "fast" mode, Tie::SecureHash generates a warning whenever a raw tie such as this is used.
"Fast" mode is enabled by importing the entire module with an extra argument:
use Tie::SecureHash "fast";
The need to use Tie::SecureHash::new was explained above: Tie::SecureHash::new knows about "fast" mode and can adjust for it, but the in-built tie function doesn't and can't.
The second caveat imposes a more significant restriction. One of the useful features of a securehash is that, once an entry has been declared with its full qualifier, any code can refer to it without the qualifier and expect the securehash to do the right thing in all unambiguous cases. However, when the securehash is replaced with a regular hash, that "do what I mean" intelligence disappears. That can lead to subtle bugs, because regular hashes autovivify and will happily create unrelated entries when both qualified and unqualified versions of a key are used.
These two restrictions are not particularly onerous, but they can be difficult to apply consistently in a large application. To make conversion to "fast" mode easier, Tie::SecureHash offers another mode, called "strict". Like "fast" mode, this mode can be invoked by importing the module with the appropriate argument:
use Tie::SecureHash "strict";In "strict" mode, securehashes control access in their normal way, except that they also produce warnings whenever a hash is explicitly tied to Tie::SecureHash, and whenever an unqualified key is used to access a securehash. Thus, code that uses securehashes and runs without warnings in "strict" mode is guaranteed to have the same behaviour in "fast" mode.
The module provides debugging facilities, and enables developers to generate safely encapsulated classes without any performance penalty, using the module's "strict" and "fast" options.
The module is available from the CPAN.